Improved Near Duplicate Matching Scheme for E-Mail Spam Detection
Today the major problem that the people are facing is spam mails or e-mail spam. In recent years there are so many schemes are developed to detect the spam emails. Here the primary idea of the similarity matching scheme for spam detection is to maintain a known spam database, formed by user's feedback, to block the subsequent near-duplicate spams. The authors propose a novel e-mail abstraction scheme, which considers e-mail layout structure to represent e-mails. They present a procedure to generate the e-mail abstraction using HTML content in e-mail, and this newly devised abstraction can more effectively capture the near-duplicate phenomenon of spams.