Aggregate Memory as an Intermediate Checkpoint Storage Device

Download Now Free registration required

Executive Summary

Applications that generate bursty I/O load, like check-pointing, require additional support to perform efficiently on next generation peta-scale supercomputers. Tens of thousands of processors, generating terabytes of snapshot data at once at each time-step, can easily overwhelm a storage system. Further, even at the current peak I/O bandwidth rates, offered by parallel file system deployments at leadership class facilities, an application is likely to spend a significant portion of its runtime check-pointing. To address these issues, the authors propose a checkpoint storage device, built from memory resources, that acts as an intermediary to the central parallel file system.

  • Format: PDF
  • Size: 84.6 KB