Data Transformations Enabling Loop Vectorization on Multithreaded Data Parallel Architectures
Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data transformations that allow one to vectorize loops targeting massively multithreaded data parallel architectures. The authors present a mathematical model that captures loop-based memory access patterns and computes the most appropriate data transformations in order to enable vectorization. The experimental results show that the proposed data transformations can significantly increase the number of loops that can be vectorized and enhance the data-level parallelism of applications.