Georgia Institute of Technology
Programmers are looking for ways to exploit the multi-core processors which have become commonplace today. One of the options available is to parallelize the existing serial programs using frameworks like OpenMP etc. However, such parallelization does not always yield the speedup expected by the programmer. This is due to various reasons, one of which is the bottleneck presented by the memory system. Carefully optimized serial algorithms fit into most of the cache available, yet these cache optimized serial algorithms might have the worst speedup when parallelized.