Analyzing Fine-Grained Hypertext Features for Enhanced Crawling and Topic Distillation

Free registration required

Executive Summary

Early Web search engines closely resembled Information Retrieval (IR) systems which had matured over several decades. Around 1996 - 1999, it became clear that the spontaneous formation of hyperlink communities in the Web graph had much to offer to Web search, leading to a flurry of research on hyperlink-based ranking of query responses. In this paper the authors show that, over and above inter-page hyperlinks, much semantic information can be teased out of the manner in which markup tags, such as menu-bars, tables, and lists are used to organize pages, and the context in which hyperlinks are made from a page to another.

  • Format: PDF
  • Size: 149.1 KB