Cloud

Scheduling Data Intensive Workloads Through Virtualization on MapReduce Based Clouds

Download Now Free registration required

Executive Summary

MapReduce has become a popular programming model for running data intensive applications on the cloud. Completion time goals or deadlines of MapReduce jobs set by users are becoming crucial in existing cloud-based data processing environments like Hadoop. There is a conflict between the scheduling MR jobs to meet deadlines and "Data locality" (assigning tasks to nodes that contain their input data). To meet the deadline a task may be scheduled on a node without local input data for that task causing expensive data transfer from a remote node. In this paper, a novel scheduler is proposed to address the above problem which is primarily based on the dynamic resource reconfiguration approach.

  • Format: PDF
  • Size: 495.4 KB