Date Added: Nov 2011
Web Page Classification (WPC) is both an important and challenging topic in data mining. The knowledge of WPC can help users to obtain useable information from the huge internet dataset automatically and efficiently. Many efforts have been made to WPC. However, there is still room for improvement of current approaches. One particular challenge in training classifiers comes from the fact that the available dataset is usually unbalanced. Standard machine learning algorithms tend to be overwhelmed by the major class and ignore the minor one and thus lead to high false negative rate.