International Journal for Development of Computer Science & Technology (IJDCST)
In recent years, the notion of collaborative spam filtering with near duplicate similarity matching scheme has been widely discussed. The primary idea of the similarity matching scheme for spam detection is to maintain a known spam database, formed by user feedback, to block subsequent near-duplicate spams. Prior approaches generate e-mail abstractions based mainly on hash-based content text. The improvement is limited since the authors map each subsequence in a node of a SpTree to a hash value. Therefore, the subsequences that have some prefix tags in common still can be differentiated with one comparison.