Exploring Linguistic Features for Web Spam Detection: A Preliminary Study

Source: Association for Computing Machinery

Favorite

Free registration required

This paper studies the usability of linguistic features in the Web spam classification task. The features were computed on two Web spam corpora: Webspam-Uk2006 and Webspam-Uk2007, they make them publicly available for other researchers. Preliminary analysis seems to indicate that certain linguistic features may be useful for the spam-detection task when combined with features studied elsewhere.
Format:PDF Size:238.10
Date:Apr 2008