Association for Computing Machinery
In this paper, the authors advocate formal locality analysis on memory references of GPGPU kernels. They investigate the locality of reference at different cache levels in the memory hierarchy. At the L1 cache level, they look into the locality behavior at the warp-, the thread block- and the streaming multiprocessor-level. Using matrix multiplication as a case study, they show that their locality analysis accurately captures some interesting and counter-intuitive behavior of the memory accesses. They believe that such analysis will provide very useful insights in understanding the memory accessing behavior and optimizing the memory hierarchy in GPU architectures.