Realizing Peer-to-Peer and Distributed Web Crawler

Download Now Date Added: Jun 2012
Format: PDF

The tremendous growth of the World Wide Web has made tools such as search engines and information retrieval systems have become essential. In this dissertation, the authors propose a fully distributed, peer-to-peer architecture for web crawling. The main goal behind the development of such a system is to provide an alternative but efficient, easily implementable and a decentralized system for crawling, indexing, caching and querying web pages. The main function of a webcrawler is to recursively visit web pages, extract all URLs form the page, parse the page for keywords and visit the extracted URLs recursively.