Download now Free registration required
To achieve high performance on multi-cores, modern loop optimizers apply long sequences of transformations that produce complex loop structures. Downstream optimizations such as register tiling (unroll-and-jam plus scalar promotion) typically provide a significant performance improvement. Typical register tilers provide this performance improvement only when applied on simple loop structures. They often fail to operate on complex loop structures leaving a significant amount of performance on the table. The authors present a technique called COmpact Multi-Dimensional kernel EXtraction (COMDEX) which can make register tilers operate on arbitrarily complex loop structures and enable them to provide the performance benefits.
- Format: PDF
- Size: 374.2 KB