Highly Accurate Distributed Classification of Web Documents

Date Added: May 2009
Format: PDF

With the rapid growth of internet, it is a scientific challenge and a massive economic need to discover an efficient and accurate text classifier for handling tons of online documents. This paper presents a distributed model for efficient web document classifications. In the model, the distributed text classifiers are trained serially with the weights on the training instances, which are adaptively set according to their previous performances. Based on the distributed model, Unequal Bagging (UBagging), an improved technique of bagging for text classifier is also proposed. Results from the experiments show that the approach could gain higher classification accuracy over traditional centralized text classifiers, and require less memory and computational time.