Mercator as a Web Crawler

Download Now Date Added: Jan 2012
Format: PDF

The Mercator describes, as a scalable, extensible web crawler written entirely in Java. In term of Scalable, web crawlers must be scalable and it is important component of many web services, but their design is not well-documented in the literature. In this paper, the authors enumerate the major components of any scalable web crawler, comment on alternatives and tradeoffs in their design, and describe the particular components used in Mercator. They also describe Mercator's support for extensibility and customizability. Finally, they comment on Mercator's performance, which they have found to be more efficient and comparable to that of other crawlers.