A Novel Parallel Domain Focused Crawler for Reduction in Load on the Network
Source: Kurukshetra University
World Wide Web is a collection of hyperlinked documents available in HTML format. Due to the growing and dynamic nature of the web, it has become a challenge to traverse all the URLs available in the web documents. The list of URL is very huge and so it is difficult to refresh it quickly as 40% of web pages change daily. Due to which more of the network resources specifically bandwidth are consumed by the web crawlers to keep the repository up to date. So, this paper proposes Parallel Domain Focused Crawler that searches and retrieves the Web pages from the Web, which are related to a specific domain only and skips irrelevant domains.