Speedup Stacks: Identifying Scaling Bottlenecks in Multi-Threaded Applications

Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved speedup is not proportional to the number of cores and threads. Sublinear scaling may have multiple causes, such as poorly scalable synchronization leading to spinning and/or yielding, and interference in shared resources such as the Last Level Cache (LLC) as well as the main memory subsystem. It is vital for programmers and processor designers to understand scaling bottlenecks in existing and emerging workloads in order to optimize application performance and design future hardware.

Provided by: Ghent University Topic: Hardware Date Added: Jan 2012 Format: PDF

Find By Topic