Georgia Institute of Technology
As the role of highly-parallel accelerators becomes more important in high performance computing, so does the need to ensure their reliable operation. In applications where precision and correctness is a necessity, bit-level reliable operation is required. While there exist mechanisms for error detection and correction, the cost-effective implementation in massively parallel accelerators is still an active area of research. In this paper the authors present an alternative software based approach for improving the reliability of massively parallel bulk synchronous processors such as modern GPUs.