A Fast Clustering-Based Feature Subset Selection Algorithm
In this paper, the authors aim at proposing the fast clustering algorithm for eliminating irrelevant and redundant data. Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, they show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. They define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new hypothesis is introduced that dissociate relevance analysis and redundancy analysis.