Practical and Low-Overhead Masking of Failures of TCP-Based Servers
This paper describes an architecture that allows a replicated service to survive crashes without breaking its TCP connections. The approach does not require modifications to the TCP protocol, to the operating system on the server, or to any of the software running on the clients. Furthermore, it runs on commodity hardware. The paper compares two implementations of this architecture (one based on primary/backup replication and another based on message logging) focusing on scalability, failover time, and application transparency. The paper evaluates three types of services: a file server, a Web server, and a multimedia streaming server. The experiments suggest that the approach incurs low overhead on throughput, scales well as the number of clients increases, and allows recovery of the service in near-optimal time.