User Based Record Retrieval for Text Classification
Feature selection is an important preprocessing step for problems of high dimension such as text categorization. Text categorization is a fundamental technique to mine massive amount of textual data. The problem is of high dimension and most of the machine learning algorithms does not perform well with all the terms in the corpus. Feature selection is a pre-processing step that removes irrelevant and redundant terms from the corpus and increases the efficiency and effectiveness of the learning techniques. Categorizing documents in a language like English is more challenging due to the presence of the phenomena like polysemy and synonymy. It has been observed that due to the difference in writing style of people, different words are used in documents to imply the same meaning.