Optimizing Content Freshness of Relations Extracted From the Web Using Keyword Search

Date Added: Jun 2010
Format: PDF

An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data accesses. As the data on the Web evolves, it is critical that the local copy be kept up-to-date. Data freshness is one of the most important data quality issues, and has been extensively studied for various applications including web crawling. However, web crawling is focused on obtaining as many raw web pages as possible. The applications, on the other hand, are interested in specific content from specific data sources.