Association for Computing Machinery
Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. The authors investigate the efficiency of different SIMD-vectorized implementations of the RabbitCT benchmark. RabbitCT performs 3D image reconstruction by back projection, a vital operation in computed tomography applications. The underlying algorithm is a challenge for vectorization because it consists, apart from a streaming part, also of a bilinear interpolation requiring scattered access to image data. They analyze the performance of SSE (128 bit), AVX (256 bit), AVX2 (256 bit), and IMCI (512 bit) implementations on recent Intel x86 systems. A special emphasis is put on the vector gather implementation on Intel Haswell and Knights Corner micro-architectures.