University of Geneva
Malicious web pages that use drive-by download attacks or social engineering techniques to install unwanted software on a user's computer have become the main avenue for the propagation of malicious code. To search for malicious web pages, the first step is typically to use a crawler to collect URLs that are live on the Internet. Then, fast pre-filtering techniques are employed to reduce the amount of pages that need to be examined by more precise, but slower, analysis tools (such as honey-clients). While effective, these techniques require a substantial amount of resources. A key reason is that the crawler encounters many pages on the web that are benign, that is, the "Toxicity" of the stream of URLs being analyzed is low.