Date Added: Jan 2011
This paper addresses cache organization in Chip MultiProcessor (CMPs). The authors introduce Nahalal, a novel Non-Uniform CAche (NUCA) topology that enables fast access to shared data for all processors, while preserving the vicinity of private data to each processor. Their characterization of memory accesses patterns in typical parallel programs shows that such a topology is appropriate for common multi-processor applications. Detailed simulations in Simics demonstrate that Nahalal decreases the shared cache access latency by up to 54% compared to traditional CMP designs, yielding performance gains of up to 16.3% in run time.