Mitigating the Negative Impact of Preemption on Heterogeneous MapReduce Workloads
Modern production clusters are often shared by multiple types of jobs with different priorities in order to improve resource utilization. Preemption is a common technique employed by MapReduce schedulers to avoid delaying production jobs while allowing the cluster to be shared by other non-production jobs. In addition, it also prevents a large job from occupying too many resources and starving others. Recent literature shows that jobs in production MapReduce clusters have a mixture of lengths and sizes spanning many orders of magnitude. In this type of environments, the current preemption policy used by MapReduce schedulers can significantly delay the completion time of long running tasks, resulting in waste of resources.