University of Washington School of Public Health & Community Medicine
Cloud computing is by far the most cost-effective technology for hosting Internet-scale services and applications. The MapReduce model, in particular, is largely used nowadays in Cloud infrastructures to meet the demand of large-scale data and computation intensive applications. Despite its success, the implications of MapReduce on the management of Cloud workload and cluster resources are still largely unstudied. In this paper, the authors show that dealing with the heterogeneity of workloads and machine capabilities is a key challenge. In today's cloud environment, workloads can have varied sizes, lengths, resource requirements, and arrival rates.