University of Peloponnese
While a traditional Hadoop cluster deployment assumes a homogeneous cluster, many enterprise clusters are grown incrementally over time, and might have a variety of different servers in the cluster. The nodes' heterogeneity represents an additional challenge for efficient cluster and job management. Due to resource heterogeneity it is often unclear which resources introduce inefficiency and bottlenecks, and how such a Hadoop cluster should be configured and optimized. In this paper, the authors explore the efficiency and performance accuracy of the bounds-based performance model for predicting the MapReduce job completion times in heterogeneous Hadoop clusters.