Exploiting Site-Level Information To Improve Web Search
Ranking Web search results has long evolved beyond simple bag-of-words retrieval models. Modern search engines routinely employ machine learning ranking that relies on exogenous relevance signals. Yet the majority of current methods still evaluate each Web page out of context. In this paper, the authors introduce a novel source of relevance information for Web search by evaluating each page in the context of its host Web site. For this purpose, they devise two strategies for compactly representing entire Web sites. They formalize the approach by building two indices, a traditional page index and a new site index, where each "Document" represents the entire Web site.