Big Data

Improving Resource Utilization in MapReduce

Date Added: May 2012
Format: PDF

MapReduce has been adopted widely in both academia and industry to run large-scale data parallel applications. In MapReduce, each slave node hosts a number of task slots to which tasks can be assigned. So they limit the maximum number of tasks that can execute concurrently on each node. When all task slots of a node are not used, the resources \"Reserved\" for idle slots are unutilized. To improve resource utilization, the authors propose resource stealing to enable running tasks to steal resources reserved for idle slots and give them back proportionally whenever new tasks are assigned. Resource stealing makes the otherwise wasted resources get fully utilized without interfering with normal job scheduling.