Compact Multi-Dimensional Kernel Extraction for Register Tiling

Executive Summary

To achieve high performance on multi-cores, modern loop optimizers apply long sequences of transformations that produce complex loop structures. Downstream optimizations such as register tiling (unroll-and-jam plus scalar promotion) typically provide a significant performance improvement. Typical register tilers provide this performance improvement only when applied on simple loop structures. They often fail to operate on complex loop structures leaving a significant amount of performance on the table. The authors present a technique called COmpact Multi-Dimensional kernel EXtraction (COMDEX) which can make register tilers operate on arbitrarily complex loop structures and enable them to provide the performance benefits.

