Institute of Electrical & Electronic Engineers
In typical 2D-mesh based Chip Multi-Processors (CMP), each node encapsulates one core, private L1 instruction cache and data cache, and the L2 storage. For CMP having a large member of cores and running large-scale workloads, the distributed shared last-level cache are widely adopted as it can store more data on-chip. However, one of the great challenges in such multi-core architectures is the data supply for different cores, i.e. as the number of nodes increases, the average L1 miss latency also increases, which can cancel out the capacity advantage of shared L2.