Data Centers

Reliability-Aware Scalability Models for High Performance Computing

Free registration required

Executive Summary

Scalability models are powerful analytical tools for evaluating and predicting the performance of parallel applications. Unfortunately, existing scalability models do not quantify failure impact and therefore cannot accurately account for application performance in the presence of failures. In this paper, the authors extend two well-known models, namely Amdahl's law and Gustafson's law, by considering the impact of failures and the effect of fault tolerance techniques on applications. The derived reliability-aware models can be used to predict application scalability in failure-present environments and evaluate fault tolerance techniques. Trace-based simulations via real failure logs demonstrate that the newly developed models provide a better understanding of application performance and scalability in the presence of failures.

  • Format: PDF
  • Size: 216.1 KB