International Journal of Computer Science & Engineering Technology (IJCSET)
The system describes new similarity-based Genetic Algorithm (GA) and thresholding Strategies (R&SCut variants). GA was designed to give appropriate weights to terms according to their semantic content and importance by using their co-occurrence information and the discriminating power values for similarity computation. After investigating the existing common thresholding strategies, design multiclass text categorization in which documents may belong to variable numbers of categories. The proposed system conducted extensive comparative experiments on two standard text collections (the Reuters-21578 and the 20- Newsgroups).