StealthWorks: Emulating Memory Errors

A study of Google's data center revealed that the incidence of main memory errors is surprisingly high. These errors can lead to application and system corruption, impacting reliability. The high error rate is an indication that new resiliency techniques will be vital in future memories. To develop such approaches, a framework is needed to conduct flexible and repeatable experiments. This paper describes such a framework, StealthWorks, to facilitate research on software resilience by behaviorally emulating memory errors in a live system. The authors illustrate it to study program tolerance to random errors and in the development of a new software technique to continuously test memory for errors.

Provided by: University of Pittsburgh Topic: Data Centers Date Added: Aug 2010 Format: PDF

Find By Topic