Probabilistic Communication and I/O Tracing With Deterministic Replay at Scale

With today's petascale supercomputers, applications often exhibit low efficiency, such as poor communication and I/O performance that can be diagnosed by analysis tools. However, these tools either produce extremely large trace files that complicate performance analysis, or sacrifice accuracy to collect high-level statistical information using crude averaging. This work contributes Scala-H-Trace, which features more aggressive trace compression than any previous approach, particularly for applications that do not show strict regularity in SPMD behavior. Scala-H-Trace uses histograms expressing the probabilistic distribution of arbitrary communication and I/O parameters to capture variations.

Provided by: North Carolina State University Topic: Data Centers Date Added: Mar 2011 Format: PDF

Find By Topic