Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors

Free registration required

Executive Summary

With increasing numbers of cores, future CMPs (Chip Multi-Processors) are likely to have a tiled architecture with a portion of shared L2 cache on each tile and a bank-interleaved distribution of the address space. Although such an organization is effective for avoiding access hot-spots, it can cause a significant number of non-local L2 accesses for many commonly occurring regular data access patterns. In this paper, the authors develop a compile-time framework for data locality optimization via data layout transformation.

  • Format: PDF
  • Size: 306.1 KB