Association for Computing Machinery
A system-level approach to fault-aware resource management of many-core systems is proposed. The proposed approach, called SHiFA, is able to tolerate run-time faults at system level without any hardware overhead. In contrast to the existing system-level methods, network resources are also considered to be potentially faulty. Accordingly, applications are mapped onto healthy nodes of the system at run-time such that their interaction will not require the use of faulty elements. By utilizing the simple routing approach, results show 100% utilizability of PEs and 99.41% of successful mapping when up to 8 links are broken.