University of Tunis El Manar
Due to the increasing influence of PVT variations and complex application scenarios, many-core systems rely on run-time optimization to achieve the expected performance, energy-efficiency and dependability. Dual-level fault-tolerance is presented on many-core systems, provided by the software-based system agent and hardware-based local agents. The system agent performs fault-triggered energy-aware remapping with bandwidth constraints, addressing coarse-grained processor failures. The local agents achieve fine-grained link-level fault tolerance against transient and permanent errors. The paper concisely presents the architecture, dual-level fault-tolerant techniques and experiment results.