GPU-Qin: A Methodology for Evaluating the Error Resilience of GPGPU Applications
While Graphics Processing Units (GPUs) have gained wide adoption as accelerators for General-Purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is challenging due to their massive parallelism, which makes it difficult to achieve representativeness while being time-efficient. This paper makes three key contributions. First, it presents the design of a fault-injection methodology to evaluate end-to-end reliability properties of application kernels running on GPUs.
Subscribe to the Innovation Insider Newsletter
Catch up on the latest tech innovations that are changing the world, including IoT, 5G, the latest about phones, security, smart cities, AI, robotics, and more. Delivered Tuesdays and Fridays