Servers

Crawler-Friendly Web Servers

Free registration required

Executive Summary

A web crawler is a program that automatically downloads pages from the Web. A typical crawler starts with a seed set of pages (e.g., yahoo.com and aol.com). It then downloads these pages, extracts hyperlinks and crawls pages pointed to by these new hyperlinks. The crawler repeats this step until there are no more pages to crawl, or some resources (e.g., time or network bandwidth) are exhausted. The aforementioned method is referred to in this paper as conventional crawling. In many cases it is important to keep the crawled pages fresh" or up-to-date, for example, if the pages are used by a web search engine like AltaVista or Google. Thus, the crawler splits its resources in crawling new pages as well as checking if previously crawled pages have changed.

  • Format: PDF
  • Size: 237.5 KB