Understanding the Propagation of Hard Errors to Software and Implications for Resilient System Design

Free registration required

Executive Summary

With continued CMOS scaling, future shipped hardware will be increasingly vulnerable to in-the-field faults. To be broadly deployable, the hardware reliability solution must incur low overheads, precluding use of expensive redundancy. The authors explore a cooperative hardware-software solution that watches for anomalous software behavior to indicate the presence of hardware faults. Fundamental to such a solution is a characterization of how hardware faults in different microarchitectural structures of a modern processor propagate through the application and OS.

  • Format: PDF
  • Size: 441.1 KB