Data Management

XML Query Optimization in Map-Reduce

Free registration required

Executive Summary

The authors present a novel query language for large-scale analysis of XML data on a map-reduce environment, called MRQL, that is expressive enough to capture most common data analysis tasks and at the same time is amenable to optimization. The authors' evaluation plans are constructed using a small number of higher-order physical operators that are directly implementable on existing map-reduce systems, such as Hadoop. The authors report on a prototype system implementation and they show some preliminary results on evaluating MRQL queries on a small cluster of PCs running Hadoop.

  • Format: PDF
  • Size: 115.6 KB