Realizing Peer-to-Peer and Distributed Web Crawler
The tremendous growth of the World Wide Web has made tools such as search engines and information retrieval systems have become essential. In this dissertation, the authors propose a fully distributed, peer-to-peer architecture for web crawling. The main goal behind the development of such a system is to provide an alternative but efficient, easily implementable and a decentralized system for crawling, indexing, caching and querying web pages. The main function of a webcrawler is to recursively visit web pages, extract all URLs form the page, parse the page for keywords and visit the extracted URLs recursively.