Challenges in Mitigating Phishing and Spam e-Mails
Source: University at Buffalo
A credible threat on the Internet that is more serious than spam is phishing. In phishing attacks, attackers use forged e-mails and Websites to appear as if they originate from legitimate organization to deceive users into disclosing personal, financial, or computer account information. This stolen information can then be used by attackers for criminal purposes, such as identity theft, larceny and fraud. Phishing e-mail detection is a hard problem. Traditional spam filters exhibit poor performance when deployed for detecting phishing e-mails. Currently, the phishing e-mail detection techniques rely on machine learning algorithms that operate on specialized feature sets, which characterize phishing e-mails. Even though these approaches use specialized features, they overlook a simple, yet crucial fact that most e-mails from legitimate financial institutions also share the same features as phishing e-mails. To get an accurate and fair measure on a classifier's performance, it is essential to examine its primary objective of segregating e-mails into two classes - phishing and ham, where the ham set includes e-mails from legitimate financial institutions. This would help in accurately demarcating features found in phishing e-mails, and exclude features also common to legitimate financial institutions. In order to achieve this goal, this paper proposes a novel technique to encapsulate the semantic meaning of e-mails. This would help in identifying the accurate intent of the e-mail, thereby improving classifiers' performance.