Association for Computing Machinery
MapReduce provides a highly simplified programming model, allowing users to run their programs distributedly by implementing mapper and reducer functions without caring about the data placement and task scheduling. Although HiveQL offers similar features with SQL, it is still difficult to map complex SQL queries into HiveQL and manual translation often leads to poor performance. A tool named QMapper is developed to address this problem by utilizing query rewriting rules and cost-based MapReduce flow evaluation on the basis of column statistics. Evaluation demonstrates that while assuring the correctness, QMapper improves the performance up to 42% in terms of execution time.