Security

TREC 2010 Web Track Notebook: Term Dependence, Spam Filtering and Quality Bias

Download Now Free registration required

Executive Summary

Many existing retrieval approaches treat all the documents in the collection equally, and do not take into account the content quality of the retrieved documents. In the submissions for TREC 2010 Web Track, the authors utilize quality-biased ranking methods that are aimed to promote documents that potentially contain high-quality content, and penalize spam and low-quality documents. The experiments with the ad hoc web topics from TREC 2010 show that features such as the spamminess of the document (As computed by the Waterloo team) and the readability of the document (Modeled by the fraction of stop words in the document) are very important for improving the precision at the top ranks.

  • Format: PDF
  • Size: 247.33 KB