Performance Evaluation of Macroblock-level Parallelization of H.264 Decoding on a cc-NUMA Multiprocessor Architecture
In this paper the authors present a study of the performance scalability of a macroblock-level parallelization of the H.264 decoder for High Definition (HD) applications on a multiprocessor architecture. They have implemented this parallelization on a cache coherent Non-Uniform Memory Access (cc-NUMA) Shared Memory multi-Processor (SMP) and compared the results with the theoretical expectations. The paper includes the evaluation of three different scheduling techniques: static, dynamic and dynamic with tail-submit. A dynamic scheduling approach with a tail-submit optimization presents the best performance obtaining a maximum speed up of 9.5 with 24 processors.