Association for Computing Machinery
MapReduce has emerged as a promising architecture for large scale data analytics on commodity clusters. The rapid adoption of Hive, a SQL-like data processing language on Hadoop shows the increasing importance of processing structured data on MapReduce platforms. MapReduce offers several attractive properties such as the use of low-cost hardware, fault-tolerance, scalability, and elasticity. However, these advantages have required a substantial performance sacrifice. In this paper, the authors introduce Clydesdale, a novel system for structured data processing on Hadoop - a popular implementation of MapReduce.