Binary Information Press
Feature selection is one of the key factors that influences the development of statistical learning based web spam detection system. In this paper, except for content features and page-level link analysis features, the authors further extract host-level link analysis features. The effectiveness of the aforementioned features is analyzed on WEBSPAM-UK2006 benchmark. Experiments show that the features of different perspectives have different identification ability and provide great complement to each other. With fused features, the best detection performance is achieved.