Towards Transient Fault Tolerance for Heterogeneous Computing Platforms
Source: University of Virginia
The computing demands of applications coupled with the power wall problem in modern processors are expected to pave the way for heterogeneous computing platforms that are composed of a variety of processors and hardware accelerators. While current heterogeneous platform design analyses assess area, performance, and power, the tremendous increase in transient fault rates requires that reliability analyses also be included, especially since fault protection mechanisms can directly affect the aforementioned area, performance, and power analyses and they affect these metrics differently when implemented on different processing components. Heterogeneous platform design therefore requires accurate characterization of fault protection mechanisms when used in different processing components.