Evaluating Operating System Vulnerability to Memory Errors

Download Now
Provided by: Association for Computing Machinery
Topic: Hardware
Format: PDF
Reliability is of great concern to the scalability of extreme-scale systems. Of particular concern are soft errors in main memory, which are a leading cause of failures on current systems and are predicted to be the leading cause on future systems. While great e ort has gone into designing algorithms and applications that can continue to make progress in the presence of these errors without restarting, the most critical software running on a node, the Operating System (OS), is currently left relatively unprotected.
Download Now

Find By Topic