Leveraging 3D PCRAM Technologies to Reduce Checkpoint Overhead for Future Exascale Systems

Provided by: Association for Computing Machinery
Topic: Storage
Format: PDF
The scalability of future Massively Parallel Processing (MPP) systems is being severely challenged by high failure rates. Current Hard Disk Drive (HDD) checkpointing results in overhead of 25% or more at the petascale. With a direct correlation between checkpoint frequencies and node counts, novel techniques that can take more frequent checkpoints with minimum overhead are critical to implement a reliable exascale system. In this paper, the authors leverage the upcoming Phase-Change Random Access Memory (PCRAM) technology and propose a hybrid local/global checkpointing mechanism.

Find By Topic