Compiler-Driven Dynamic Reliability Management for On-Chip Systems Under Variabilities
In this paper, the authors present a novel Dynamic Reliability Management System (DyReMS) for on-chip systems that performs resilience-driven resource allocation and mapping. It accounts for both the tasks' resilience properties and heterogeneous error recovery features of different cores. DyReMS also chooses a reliable task version (out of multiple reliability-aware transformed options) depending upon the reliability level of the allocated core. In case of error detection, rollbacks are performed. Their system provides 70%-87% improved task reliability compared to a timing reliability-optimizing core assignment, i.e. minimizing the probability of deadline misses.