Date Added: Dec 2010
Feature selection is a fundamental problem in machine learning and data mining. How to choose the most problem-related features from a set of collected features is essential. In this paper, a novel method using correlation coefficient clustering in removing similar/redundant features is proposed. The collected features are grouped into clusters by measuring their correlation coefficient values. The most class-dependent feature in each cluster is retained while others in the same cluster are removed. Thus, the most class-related and mutually unrelated features are identified.