International Journal of Advanced Research in Computer Science & Technology (IJARCST)
Feature subset clustering is a powerful technique to reduce the dimensionality of feature vectors for text classification. In this paper, the authors propose a similarity-based self-constructing algorithm for feature clustering with the help of K-Means strategy. The words in the feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each other are grouped into the same cluster, and make a head to each cluster data sets. By the FAST algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data.