Association for Computing Machinery
One of the many challenges of designing efficient manycore systems is to determine where and to what degree shared information is cached locally. In this paper, the authors specifically address efficient solutions for distributing virtual-to-physical address translations and keeping them coherent throughout a Chip Multi-Processor (CMP) system with hundreds of cores. They evaluate multiple mechanisms in terms of their performance and overhead with the aid of software simulation. Since TLB information is invalidated rarely, they find that the mechanisms with a fast common case performed much better, and that TLB reload overhead (and not communication) was a significant factor in the performance of many benchmarks.