Business Intelligence

Separable 2D Convolution With Polymorphic Register Files

Download Now Free registration required

Executive Summary

Processor designers consider various options to utilize the steadily increasing number of transistors of each new semiconductor technology generation. This paper studies the performance of separable 2D convolution on multi-lane Polymorphic Register Files (PRFs). The authors present a matrix transposition algorithm optimized for PRFs, and a 2D vectorized convolution algorithm which avoids strided memory accesses. They compare the throughput of their PRF to the nVidia Tesla C2050 GPU. The results show that even in bandwidth constrained systems, multi-lane PRFs can outperform the GPU for 9

  • Format: PDF
  • Size: 296.29 KB