Intra-Vector SIMD Instructions for Core Specialization
Research is mainly focusing on exploiting TLP to increase performance. Another avenue, however, for achieving performance scalability is specialization. In this paper the authors propose a handful of application specific intra-vector instructions for two dimensional signal processing kernels. Utilizing the SIMD capabilities in the row wise operations requires significant data rearrangement overhead. When using the intra-vector instructions for those operations instead, the overhead can be avoided. They have implemented intra-vector instructions in the cell SPU core and measured speedups up to 2.06, with an average of 1.45.