University of Central England in Birmingham
Reliability becomes a key issue in computer system design as microprocessors are increasingly susceptible to transient faults. Many previously proposed schemes exploit Simultaneous Multi-Threaded (SMT) architectures to achieve transient-fault tolerance by running a program concurrently on two threads, a main thread and a redundant checker thread. Such schemes however often incur high performance overheads due to resource contention and redundancy checking. In this paper, the authors propose Dual-Thread Execution (DTE) for SMT processors to efficiently achieve transient-fault tolerance. DTE is derived from the recently proposed Fault-Tolerant Dual-Core Execution (FTDCE) paradigm, in which two processor cores on a single chip perform redundant execution to improve both reliability and performance.