ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data
The authors present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. The system harvests real-world items from template-based HTML pages (the so-called structured Web). It illustrates a two-phase querying of the Web, in which an intentional description of the targeted data is first provided, in a flexible and widely applicable manner. ObjectRunner follows then a lightweight, best-effort approach, leveraging both the input description and the source structure. This process is domain-independent, in the sense that it applies to any relation, either flat or nested, describing real-world items.