An Effective Rule-Based Probabilistic Classifier for Text Mining
The authors define a new probabilistic classifier for text mining, which uses ODP taxonomy and domain ontology and datasets to cluster and identify the category of the given text document. Their algorithm calculates the positive probability value and negative probability value for each term set or pattern identified from the document. Based on the calculated probability value their classifier indexes the document to the concern group of the cluster. They use Reuter's data set which contains more than 30 categories, but they use only ten categories to evaluate their algorithm. Their classification algorithm uses 60 percent of dataset as training set and 40 percent as testing set. Their algorithm reduces the value of false indexing and reduces the overlap.