Effective Web Scraping with OXPath

Provided by: Association for Computing Machinery
Topic: Data Management
Format: PDF
Even in the third decade of the web, scraping web sites remains a challenging task: most scraping programs are still developed as ad-hoc solutions using a complex stack of languages and tools. Where comprehensive extraction solutions exist, they are expensive, heavyweight, and proprietary. OXPath is a minimalistic wrapping language that is nevertheless expressive and versatile enough for a wide range of scraping tasks. In this paper, the authors want to introduce the user to a new paradigm of scraping: declarative navigation - instead of complex scripting or heavyweight, limited visual tools, OXPath turns scraping into a simple two step process: pick the relevant nodes through an XPath expression and then specify which action to apply to those nodes.

Find By Topic