Networking

Analyzing Fine-Grained Hypertext Features for Enhanced Crawling and Topic Distillation

Download Now Date Added: Jan 2011
Format: PDF

Early Web search engines closely resembled Information Retrieval (IR) systems which had matured over several decades. Around 1996 - 1999, it became clear that the spontaneous formation of hyperlink communities in the Web graph had much to offer to Web search, leading to a flurry of research on hyperlink-based ranking of query responses. In this paper the authors show that, over and above inter-page hyperlinks, much semantic information can be teased out of the manner in which markup tags, such as menu-bars, tables, and lists are used to organize pages, and the context in which hyperlinks are made from a page to another.