Binary Information Press
The volume of text documents in electronic format has been rising exponentially in recent years. There is therefore a growing need to organize these documents. One effective technique for this task is to incrementally learn from training documents and classify documents into categories. In this paper, the authors propose an text categorization algorithm, called CNNTC, to adapt KNN to incremental learning. They first employ an unsupervised incremental clustering algorithm to construct an updatable cluster-based categorization model. The KNN decision rule is then used to classify test documents. Extensive experiments show that CNNTC is an effective updatable text categorization algorithm, which is able to outperform KNN and SVM on both English and Chinese corpora.