Institute of Electrical & Electronic Engineers
Today huge amount of information are being associated with the web technology and the internet. To gather useful information from it these text has to be categorized. The task to classify a given data instance into a pre-specified set of categories is known as "Text Categorization" (TC). Text Categorization is a pattern classification task for text mining and necessary for efficient management of textual information systems. The documents can be classified by three ways unsupervised, supervised and semi supervised methods. Text categorization refers to the process of assign a category or some categories among predefined ones to each document, automatically. This paper presents a comparative study on different types of approaches to text categorization.