Learning to Detect Spam: Naive-Euclidean Approach

Free registration required

Executive Summary

A method is proposed for learning to classify spam and non-spam emails. It combines the strategy of the Best Stepwise Feature Selection with a classifier of Euclidean nearest-neighbor. Each text email is first transformed into a vector of D-dimensional Euclidean space. Emails were divided into training and test sets in the manner of 10-fold cross-validation. Three experiments were performed, and their elapsed CPU times and accuracies reported. The proposed spam detection learner was found to be extremely fast in recognition and with good error rates. It could be used as a baseline learning agent, in terms of CPU time and accuracy, against which other learning agents can be measured.

  • Format: PDF
  • Size: 701.21 KB