Eindhoven University of Technology
As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, the efficient use of their caches has become important for performance and energy. However, optimizing cache locality systematically requires insight into and prediction of cache behavior. On sequential processors, stack distance or reuse distance theory is a well-known means to model cache behavior. However, it is not straightforward to apply this theory to GPUs, mainly because of the parallel execution model and fine-grained multi-threading. This paper extends reuse distance to GPUs by modeling: the GPU's hierarchy of threads, warps, threadblocks, and sets of active threads, conditional and non-uniform latencies, cache associativity, miss-status holding-registers and warp divergence.