Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design
The authors investigated how operating system design should be adapted for Multi-Threaded (MT) Chip Multi-Processors (CMP) - a new generation of processors that exploit thread-level parallelism to mask the memory latency in modern workloads. They determined that the L2 cache is a critical shared resource on CMT and that an insufficient amount of L2 cache can undermine the ability to hide memory latency on these processors. To use the L2 cache as efficiently as possible, they propose an L2-conscious scheduling algorithm and quantify its performance potential. Using this algorithm it is possible to reduce miss ratios in the L2 cache by 25-37% and improve processor throughput by 27-45%.