George Washington University
Modern day data centers coordinate hundreds of thousands of heterogeneous tasks and aim at delivering highly reliable cloud computing services. Although offering equal reliability to all users benefits everyone at the same time, users may find such an approach either too inadequate or too expensive to fit their individual requirements, which may vary dramatically. In this paper, the authors propose a novel method for providing reliability as an elastic and on-demand service. Their scheme makes use of peer-to-peer check-pointing and allows user reliability levels to be jointly optimized based on an assessment of their individual requirements and total available resources in the data center.