Web Spam Detection Using Machine Learning in Specific Domain Features

Free registration required

Executive Summary

In the last few years, as Internet usage becomes the main artery of the life's daily activities, the problem of spam becomes very serious for internet community. Spam pages form a real threat for all types of users. This threat proved to evolve continuously without any clue to abate. Different forms of spam witnessed a dramatic increase in both size and negative impact. A large amount of E-mails and web pages are considered spam either in Simple Mail Transfer Protocol (SMTP) or search engines. Many technical methods were proposed to approach the problem of spam. The paper proposes an improved Naive Bayes Classifier that gives weight to the information fed by users and takes into consideration the existence of some domain specific features.

  • Format: PDF
  • Size: 193.1 KB