Data Centers

Fault Tolerance in Multi-Core Processors Using Flexible Redundant Threading

Date Added: Jul 2014
Format: PDF

In this paper, the authors make a case for incorporating fault tolerance covering both transient faults and errors due to miss-speculation into desktop multi-core processors. Fault tolerance is enforced in their method through redundant threading by verifying the commit results of speculative original thread against the non-speculative redundant thread which is a delayed version of the original thread. Their method differs from previously proposed redundant execution fault-tolerant designs including Active-stream/Redundant-stream Simultaneous Multi-Threading (AR SMT), Simultaneous and Redundantly Threaded (SRT) processor, Chip-level Redundant Threading (CRT) in cases that it covers errors due to miss-speculation, dynamic hardware configuration, reuse of resource for normal computation, flexibility in fault-tolerant enforcement.