Journal of Computing
Pattern classification has been successfully applied in many problem domains, such as biometric recognition, document classification and medical diagnosis. Missing data or unknown data is a common problem in data quality of a pattern classification. Such missing data are generally ignored or simply imputed in pattern classification, which will affect the performance of the classification. The authors applied two methods K-nearest neighbor and probabilistic principal component analysis to impute the missing values of patterns. In the K-nearest neighbor method, the missing data is imputed using values from K most similar cases.