Date Added: Mar 2011
Phishing is an increasingly sophisticated method to steal personal user information using sites that pretend to be legitimate. In this paper, the authors take the following steps to identify phishing URLs. First, they carefully select lexical features of the URLs that are resistant to obfuscation techniques used by attackers. Second, they evaluate the classification accuracy when using only lexical features, both automatically and hand-selected, vs. when using additional features. They show that lexical features are sufficient for all practical purposes. Third, they thoroughly compare several classification algorithms, and they propose to use an online method (AROW) that is able to overcome noisy training data.