A Light-Weight Cache-Based Fault Detection and Checkpointing Scheme for MPSoCs Enabling Relaxed Execution Synchronization

Free registration required

Executive Summary

While technology advances have made MPSoCs a standard architecture for embedded systems, their applicability is increasingly being challenged by dramatic increases in the amount of device failures that may occur during execution. Conventional fault tolerance techniques employ a duplication-and-comparison strategy to detect arbitrary execution faults, as well as a checkpointing-and-rollback strategy to recover from the faulty state. Comparison and checkpointing are performed either at task level, thus imposing a large amount of overhead in verifying and backing up memory pages, or at instruction level, thus necessitating a lock-step execution model which significantly limits the attainable performance.

  • Format: PDF
  • Size: 183.47 KB