Date Added: Oct 2011
In order to save time in extracting specific information from high volume of data in web documents, this paper proposes an architectural model of generic web document classification system using design patterns for classifying web documents. This work implements two classification techniques for classifying Thai web documents, namely centroid classification and neural network classification, based on the proposed model and compares their classification effectiveness empirically. The training data sets in this experiment consist of 500 web documents of the following five categories (100 documents for each category): mobile phone sales, book sales, travel sales, education information and company profile. Another two hundred and fifty web documents were then used to test the two classifiers.