An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization
The e-mail's header session usually contains important attributes such as e-mail title, sender's name, sender's e-mail address, sending date, which are helpful to classification of e-mails. In this paper, the authors apply decision tree data mining technique to header's basic attributes to analyze the association rules of spam e-mails and propose an efficient spam filtering method to accurately identify spam and legitimate e-mails. According to the experiment of applying numerous Chinese e-mails to their spam filtering method, they obtain the following excellent datums: the Accuracy is 96.5%, the Precision is 96.67%, and the Re-call is 96.3%. Thus, the method proposed in this paper can efficiently identify the spam e-mails by checking only the header sessions, which can reduce the cost for calculation.