Date Added: Jan 2012
This paper describes the design and implementation of SecondSite, a cloud-based service for disaster tolerance. SecondSite extends the Remus virtualization-based high availability system by allowing groups of virtual machines to be replicated across data centers over wide-area Internet links. The goal of the system is to commodify the property of availability, exposing it as a simple tick box when configuring a new virtual machine. To achieve this in the wide area, the authors have had to tackle the related issues of replication traffic bandwidth, reliable failure detection across geographic regions and traffic redirection over a wide-area network without compromising on transparency and consistency.