International Journal Of Engineering And Computer Science
Email communication is widely spread and essential nowadays. However, the threat of unsolicited junk emails, also known as spam, becomes more and more serious. The basic idea of the similarity matching schema for spam detection is to maintain a known spam database, formed by user feedback, to block subsequent near-duplicate spam. By achieving efficient similarity matching and reducing storage utilization, prior works mainly represent each email by a succinct abstraction derived from email content text. But, these abstractions of emails cannot fully catch the evolving nature of spam, and are thus not effective enough in near-duplicate detection.