Delft University of Technology
The authors have entered the era of Chip Multi-Processors (CMPs) and at time of writing they are already being deployed in many market segments. In this paper, they propose architectural enhancements to specialize the Cell SPU for video decoding. Through thorough analysis of the H.264 video decoding kernels they identify the execution bottlenecks among which are matrix transposition, scalar operations, and lack of saturating arithmetic. Based on these bottlenecks they propose ISA extensions that speed up the execution. The speedup achieved on the IDCT8, IDCT4, and deblocking filter kernel is between 1.69 and 2.01.