Filtering Email Spam in the Presence of Noisy User Feedback
Source: Tufts University
Off late email spam filtering evaluations, such as those conducted at TREC, have shown that near-perfect filtering results are attained with a variety of machine learning methods when filters are given perfectly accurate labeling feedback for training. Yet in real world settings, labeling feedback may be far from perfect. Real users give feedback that is often mistaken, inconsistent, or even maliciously inaccurate. This paper shows that noisy feedback may harm or even break state-of-the-art spam filters, including recent TREC winners. It then proposes and evaluates several approaches to make such filters robust to label noise. It find that although such modifications are effective for uniform random label noise, more realistic "Natural" label noise from human users remains a difficult challenge.