Reasoning With Missing Values in Multi Attribute Datasets

The presence of missing data in a datasets can affect the performance of classifier which leads to difficulty of extracting useful information from datasets. Dataset taken for this study is student records of university system that contains some missing values. To compute these missing values three technique are used named as Litwise deletion, Mean/mode imputation and KNN imputation, which result in imputed datasets. On these resulting datasets C4.5 classification algorithm is applied individually. This work analyzes the performance of imputation methods using C4.5 classifier on the basis of accuracy for handling missing data. Weka data mining tool is used for this experimental analysis.

Provided by: International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE) Topic: Data Management Date Added: May 2013 Format: PDF

Find By Topic