Web Development

Wrapper Maintenance

Free registration required

Executive Summary

A Web wrapper is a software application that extracts information from a semi-structured source and converts it to a structured format. While semi-structured sources, such as Web pages, contain no explicitly specified schema, they do have an implicit grammar that can be used to identify relevant information in the document. A wrapper learning system analyzes page layout to generate either grammar-based or "Landmark"-based extraction rules that wrappers use to extract data. As a consequence, even slight changes in the page layout can break the wrapper and prevent it from extracting data correctly. Wrapper maintenance is a composite task that verifies that the wrapper continues to extract data correctly from a source, and repairs the wrapper so that it works on the changed pages.

  • Format: PDF
  • Size: 113.4 KB