Large-Scale Automatic Classification of Phishing Pages
Phishing websites, fraudulent sites that impersonate a trusted third party to gain access to private data, continue to cost Internet users over a billion dollars each year. This paper describes the design and performance characteristics of a scalable machine learning classifier developed to detect phishing websites. This classifier is used to maintain Google's phishing blacklist automatically. This classifier analyzes millions of pages a day, examining the URL and the contents of a page to determine whether or not a page is phishing.