A Cluster Based Approach with N-Grams at Word Level for Document Classification

Provided by: International Journal of Computer Applications Topic: Data Management Format: PDF
A breakneck progress of computers and web makes it easier to collect and store large amount of information in the form of text; e.g., reviews, forum postings, blogs, web pages, news articles and email messages. In text mining, growing size of text datasets and high dimensionality associated with natural language is great challenge which makes it difficult to classify documents in various categories and sub-categories. This paper focuses on cluster based document classification technique so that data inside each cluster shares some common trait.

Find By Topic