Evaluating Operating System Vulnerability to Memory Errors

Reliability is of great concern to the scalability of extreme-scale systems. Of particular concern are soft errors in main memory, which are a leading cause of failures on current systems and are predicted to be the leading cause on future systems. While great e ort has gone into designing algorithms and applications that can continue to make progress in the presence of these errors without restarting, the most critical software running on a node, the Operating System (OS), is currently left relatively unprotected.

Provided by: Association for Computing Machinery Topic: Hardware Date Added: Jun 2012 Format: PDF

Find By Topic