Learning to Detect Malicious URLs
Malicious web sites are a cornerstone of Internet criminal activities. The dangers of these sites have created a demand for safeguards that protect end-users from visiting them. This paper explores how to detect malicious web sites from the lexical and host-based features of their URLs. The authors show that this problem lends itself naturally to modern algorithms for online learning. Online algorithms not only process large numbers of URLs more efficiently than batch algorithms, they also adapt more quickly to new features in the continuously evolving distribution of malicious URLs. They develop a real-time system for gathering URL features and pair it with a real-time feed of labeledURLs from a largeWeb mail provider.