CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

Download Now
Provided by: The Ohio Society of CPAs
Topic: Big Data
Format: PDF
Checkpoint/Restart (C/R) mechanisms have been widely adopted by many MPI libraries to achieve fault-tolerance. However, a major limitation of such mechanisms is the intensive IO bottleneck caused by the need to dump the snapshots of all processes into persistent storage. Several studies have been conducted to minimize this overhead, but most of these proposed optimizations are performed inside specific MPI stack or checkpointing library or applications, hence they are not portable enough to be applied to other MPI stacks and applications.
Download Now

Find By Topic