Association for Computing Machinery
MapReduce has emerged as a prevailing distributed computation paradigm for enterprise and large-scale data-intensive computing. The model is also increasingly used in the massively-parallel cloud environment, where MapReduce jobs are run on a set of Virtual Machines (VMs) on pay-as-needed basis. However, MapReduce jobs suffer from performance degradation when running in the cloud due to inefficient resource allocation. In particular, the MapReduce model is designed for and leverages information from the native clusters to operate efficiently, whereas the cloud presents a virtual cluster topology overlying or hiding actual network information.