Fault Tolerance on Multicore Processors using Deterministic Multithreading

Provided by: Delft University of Technology
Topic: Data Centers
Format: PDF
In this paper the authors describe a software based fault tolerance approach for multithreaded programs running on multicore processors. Redundant multithreaded processes are used to detect soft errors and recover from them. Their scheme makes sure that the execution of the redundant processes is identical even in the presence of non-determinism due to shared memory accesses. This is done by making sure that the redundant processes acquire the locks for accessing the shared memory in the same order. Instead of using record/replay technique to do that, their scheme is based on deterministic multithreading, meaning that for the same input, a multithreaded program always have the same lock interleaving.

Find By Topic