Reducing Overhead for Soft Error Coverage in High Availability Systems
Source: University of Wisconsin
High reliability/availability systems typically use redundant computation and components to achieve detection, isolation and recovery from faults. Chip multiprocessors (CMPs) incorporate multiple identical components on a chip to provide high performance/watt. These identical components can be used in a redundant configuration to build cost-effective high availability systems. Current high availability systems like NonStop and Stratus replicate all components of the system including cores, system circuitry and memory. However, it would likely be infeasible to replicate all the components of a system in low cost commodity CMP-based high availability system. In this paper, the authors focus on low overhead techniques to detect logic soft errors in high availability systems.