Download now Free registration required
The authors present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the Map-Reduce framework. Pipelines in the approach consist of sequences of processing steps that consume XML-structured data and produce, often through calls to "Black-box" functions, modified (i.e., updated) XML structures. The main contributions are a set of strategies for compiling such XML pipelines into parallel Map-Reduce networks and a discussion of their advantages and tradeoffs. They present a detailed experimental evaluation of these approaches using the Hadoop MapReduce system as the implementation platform. The results show that execution times of XML pipelines can be significantly reduced using the compilation strategies.
- Format: PDF
- Size: 367.6 KB