Delft University of Technology
In recent years, more and more many-core processors are superseding sequential ones. Increasing parallelism, rather than increasing clock rate, has become the primary engine of processor performance growth, and this trend is likely to continue. With the integration of more computational cores and deeper memory hierarchies on modern processors, the performance gap between naively parallelized code and optimized code becomes much larger than ever before. Very often, bridging the gap involves architecture-specific optimizations. These optimizations are difficult to implement by application programmers, who typically focus on the basic functionality of their code.