Big Data

Clustering Technique in Data Mining for Text Documents

Date Added: Jan 2012
Format: PDF

Feature selection is an important method for improving the efficiency and accuracy of text categorization algorithms by removing redundant and irrelevant terms from the Data warehouse. The semantic clustering and feature selection method is proposed to improve the clustering and feature selection mechanism with semantic relations of the text documents. Also a new text clustering algorithm TCFS, which stands for Text Clustering with Feature Selection is proposed. TCFS can incorporate CHIR, a new supervised feature selection method to identify relevant features (i.e., terms) iteratively, and the clustering becomes a learning process.