Business Intelligence

The Chinese Text Categorization System With Category Priorities

Free registration required

Executive Summary

The process of text categorization involves some understanding of the content of the documents and/or some previous knowledge of the categories. For the content of the documents, the authors use a filtering measure for feature selection in their Chinese text categorization system. They modify the formula of Term Frequency-Inverse Document Frequency (TF-IDF) to strengthen important keywords' weights and weaken unimportant keywords' weights. For the knowledge of the categories, they use category priority to represent the relationship between two different categories. Consequently, the experimental results show that their method can effectively not only decrease noise text but also increase the accuracy rate and recall rate of text categorization.

  • Format: PDF
  • Size: 480.68 KB