Deriving Dynamics of Web Pages: A Survey

The World Wide Web is dynamic by nature: content is continuously added, deleted, or changed, which makes it challenging for Web crawlers to keep up-to-date with the current version of a Web page, all the more so since not all apparent changes are significant ones. The authors review major approaches to change detection in Web pages and extraction of temporal properties (especially, timestamps) of Web pages. They focus their attention on techniques and systems that have been proposed in the last ten years and they analyze them to get some insight into the practical solutions and best practices available.

Provided by: Telecom ParisTech Topic: Data Management Date Added: Mar 2011 Format: PDF

Find By Topic