Shared Cluster Scheduling: A Fair and Efficient Protocol
In this paper, the authors focus on the problem of resource allocation in a shared cluster used for data-intensive scalable computing. Specifically, they target the open-source implementation of the MapReduce framework, Hadoop, and design a new scheduling algorithm that caters both to a fair and efficient utilization of a shared cluster. Their scheduler, labelled FSP, achieves both goals by "Focusing" the resources of a cluster toward individual jobs rather than partitioning the cluster such that all submitted jobs run in parallel. In this paper, they discuss in detail the implementation challenges posed by their scheduler, and suggest a new technique to achieve preemption through Job suspension.