Feature Selection with LSI &PDDP for Machine Learning KNN Classification
Text categorization has become one of the key techniques for handling and organizing text data. In practical text classification tasks, the ability to interpret the classification results is an important as the ability to classify exactly. This paper will focus on the feature selection, for reducing the dimensionality of the vectors. The authors propose a new algorithm capable of partitioning a set of documents or other samples based on an embedding in a high-dimensional Euclidean space i.e. in which every document is a vector of real numbers, and then they apply classification techniques like KNN for categorizations the data and finally evaluate the results by using precisions, etc.