Association for Computing Machinery
Phishing sites have become a common approach to steal sensitive information, such as usernames, passwords and credit card details of the internet users. The authors propose a semi-supervised machine learning approach to detect phishing URLs from a set of phishing and spam URLs. Spam emails are the source of these URLs. In reality, the number of phishing URLs received through these spam emails is fewer compared to other URLs. Their study is targeted to detect phishing URLs in a realistic scenario of a highly imbalanced data set containing phishing and spam URLs with 1:654 ratio.