Collaborative Spam Filtering With the Hashing Trick

Free registration required

Executive Summary

User feedback is vital to the quality of the collaborative spam filters frequently used in open membership email systems such as Yahoo Mail or Gmail. Users occasionally designate emails as spam or non-spam (Often termed as ham), and these labels are subsequently used to train the spam filter. Although the majority of users provide very little data, as a collective the amount of training data is very large (Many millions of emails per day). Unfortunately, there is substantial deviation in users' notions of what constitutes spam and ham.

  • Format: PDF
  • Size: 713.3 KB