A Composite and Scalable Cache Coherence Protocol for Large Scale CMPs
The number of on-chip cores of modern Chip Multi-Processors (CMPs) is growing fast with technology scaling. However, it remains a big challenge to efficiently support cache coherence for large scale CMPs. The conventional snoopy and directory coherence protocols cannot be smoothly scaled to many-core or thousand-core processors. Snoopy protocols introduce large power overhead due to enormous amount of cache tag probing triggered by broadcast. Directory proto-cols introduce performance penalty due to indirection, and large storage overhead due to storing directories.