Text Categorization Research Based on Cluster Idea
Classification and clustering are frequently-used methods in data excavation technology. Entropy model is the base structure of automated word categorizing. In this paper, words appear consecutive frequently will place in different groups. Although this method is not correct always, in the most cases, obtained results simulate real situation. Text-based matching is performed to generate "Soft" seeds, which are then used to guide clustering in the basic feature space. Because of NP-Complete structure of clustering problems, the entropy model cannot be solved by an optimal algorithm, so a number of heuristic algorithms were developed to solve this problem.