Association for Computing Machinery
MapReduce and Hadoop represent an economically compelling alternative for efficient large scale data processing and cost-effective analytics over \"Big data\" in the enterprise. There is a slew of interesting applications associated with live business intelligence that require completion time guarantees. While there were a few research e orts to design different models and approaches for predicting performance of MapReduce applications, this question still remains a challenging research problem. Some of the past modeling e orts aim to predict the job completion time by analyzing the execution times' distribution of map and reduce tasks, and deriving some scaling factors for these execution times when the original MapReduce application is applied for processing a larger dataset.