Web Development

Highly Accurate Distributed Classification of Web Documents

Download Now Free registration required

Executive Summary

With the rapid growth of internet, it is a scientific challenge and a massive economic need to discover an efficient and accurate text classifier for handling tons of online documents. This paper presents a distributed model for efficient web document classifications. In the model, the distributed text classifiers are trained serially with the weights on the training instances, which are adaptively set according to their previous performances. Based on the distributed model, Unequal Bagging (UBagging), an improved technique of bagging for text classifier is also proposed. Results from the experiments show that the approach could gain higher classification accuracy over traditional centralized text classifiers, and require less memory and computational time.

  • Format: PDF
  • Size: 150.2 KB