North Carolina State University
Chip Multi-Processors (CMP) have become a mainstream computing platform. As transistor density shrinks and the number of cores increases, more scalable CMP architectures will emerge. Recently, tiled architectures have shown such scalable characteristics and been used in many industry chips. The memory hierarchy in tiled architectures presents interesting design challenges. One major challenge is the organization of the Last Level Cache (LLC). Shared but distributed LLCs are preferred over private LLCs due to better utilization of the aggregate cache capacity. However, such architectures suffer from high on-chip hit latency. Breaking down the shared LLC into smaller domains called clusters where each cluster is associated with one process or VM can reduce the on-chip hit latency significantly.