Download Now Free registration required
The authors present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. The system harvests real-world items from template-based HTML pages (the so-called structured Web). It illustrates a two-phase querying of the Web, in which an intentional description of the targeted data is first provided, in a flexible and widely applicable manner. ObjectRunner follows then a lightweight, best-effort approach, leveraging both the input description and the source structure. This process is domain-independent, in the sense that it applies to any relation, either flat or nested, describing real-world items.
- Format: PDF
- Size: 761 KB