High Performance State-Machine Replication
State-machine replication is a well-established approach to fault tolerance. The idea is to replicate a service on multiple servers so that it remains available despite the failure of one or more servers. From a performance perspective, state-machine replication has two limitations. First, it introduces some overhead in service response time, due to the requirement to totally order commands. Second, service throughput cannot be augmented by adding replicas to the system. The authors address the two issues in this paper. They use speculative execution to reduce the response time and state partitioning to increase the throughput of state-machine replication. They illustrate these techniques with a highly available parallel B-tree service.