Binary Information Press
Classification based on association rules is a common and easily understand algorithm for text classification. To improve its classification accuracy, the key is to generate more effective rules. Sometimes, it will overdraw the role of some training texts. To avoiding this and generate more effective rules, this paper defines the Classification Path and proposes a new text classifier (CP-tree) based on CR-tree. When a new text is coming, to avoid overdrawing, association rules are generated through scanning Classification Path; in addition, to make the role of association rules in class prediction more reasonable, the association rules and weight for the new text are obtained not only according to the training texts, but also the new texts.