Email Classification for Spam Detection Using Word Stemming
Unsolicited emails, known as spam, are one of the fast growing and costly problems associated with the Internet today. Among the many proposed solutions, a technique using Bayesian filtering is considered as the most effective weapon against spam. Bayesian filtering works by evaluating the probability of different words appearing in legitimate and spam mails and then classifying them based on that probability. Most of the current spam email detection systems use keywords to detect spam emails. These keywords can be written as misspellings eg: baank or bannk instead of bank. Misspellings are changed from time to time and hence spam email detection system needs to constantly update the blacklist to detect spam emails containing misspellings.