Georgia Institute of Technology
The authors consider the problem of how to improve memory latency tolerance inmassively multithreaded GPGPUs when the thread-level parallelism of an application is not sufficient to hide memory latency. One solution used in conventional CPU systems is prefetching, both in hardware and software. However, they show that straightforwardly applying such mechanisms to GPGPU systems does not deliver the expected performance benefits and can in fact hurt performance when not used judiciously. This paper proposes new hardware and software prefetching mechanisms tailored to GPGPU systems, which they refer to as Many-Thread aware prefetching (MT-prefetching) mechanisms.