Neighborhood Component Feature Selection for High-Dimensional Data
Feature selection is of considerable importance in data mining and machine learning, especially for high dimensional data. In this paper, the authors propose a novel nearest neighbor-based feature weighting algorithm, which learns a feature weighting vector by maximizing the expected leave-one-out classification accuracy with a regularization term. The algorithm makes no parametric assumptions about the distribution of the data and scales naturally to multiclass problems. Experiments conducted on artificial and real data sets demonstrate that the proposed algorithm is largely insensitive to the increase in the number of irrelevant features and performs better than the state-of-the-art methods in most cases.