REDAC: Distributed, Asynchronous Redundancy in Shared Memory Servers
Source: Carnegie Mellon University
The emergence of multi-core architectures - driven by continued technology scaling - has led to concerns about increasing soft- and hard-error rates in commodity designs. Because modern chip designs consist of multiple high-speed clock domains, conventional lockstepped redundant execution is no longer practical. Recent work suggests an asynchronous approach to redundant execution, where processor pairs independently execute an instruction stream and treat any differences like soft errors, invoking rollback recovery. Because prior designs buffer instruction results within the out-of-order instruction window, they are limited to tightly coupled redundancy within a single chip, which limits availability and serviceability in the presence of hard errors.