Looking Into the Past to Better Classify Web Spam

Free registration required

Executive Summary

Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However, existing spam detection work only considers current information. The authors argue that historical web page information may also be important in spam classification. In this paper, the authors use content features from historical versions of web pages to improve spam classification. They use supervised learning techniques to combine classifiers based on current page content with classifiers based on temporal features.

  • Format: PDF
  • Size: 130.32 KB