Delft University of Technology
Research is mainly focusing on exploiting TLP to increase performance. Another avenue, however, for achieving performance scalability is specialization. In this paper the authors propose a handful of application specific intra-vector instructions for two dimensional signal processing kernels. Utilizing the SIMD capabilities in the row wise operations requires significant data rearrangement overhead. When using the intra-vector instructions for those operations instead, the overhead can be avoided. They have implemented intra-vector instructions in the cell SPU core and measured speedups up to 2.06, with an average of 1.45.