Download now Free registration required
This paper describes an e-mail spam filter based on local SVM, namely on the SVM classifier trained only on a neighborhood of the message to be classified, and not on the whole training data available. Two problems are stated and solved. First, the selection of the right size of neighborhood is shown to be critical; their solution is based on the estimation of the a-posteriori probability of the correct decision, and the resulting algorithm is called highest probability SVM nearest neighbor (HP-SVM-NN). The second problem is the application of the algorithm in practice, and they propose practical filter architecture based on HP-SVM-NN. Extensive testing is performed on SpamAssassin corpus and TREC 2005 Spam Track corpus, showing that HP-SVM-NN outperforms pure SVM and is applicable in practice.
- Format: PDF
- Size: 718.8 KB