Data Centers

Improving MapReduce Performance Through Data Placement in Heterogeneous Hadoop Clusters

Free registration required

Executive Summary

MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop - an open-source implementation of MapReduce is widely used for short jobs requiring low response time. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Data locality has not been taken into account for launching speculative map tasks, because it is assumed that most maps are data-local. Unfortunately, both the homogeneity and data locality assumptions are not satisfied in virtualized data centers.

  • Format: PDF
  • Size: 116.3 KB