Date Added: Jul 2011
Future multi-core processors will necessitate exploitation of fine-grain, architecture-independent parallelism from applications to utilize many cores with relatively small local memories. The authors use c264, an end-to-end H.264 video encoder for the Cell processor based on x264, to show that exploiting fine-grain parallelism remains challenging and requires significant advancement in runtime support. Their implementation of c264 achieves speedup between 4.7? and 8.6? on six Synergistic Processing Elements (SPEs), compared to the serial version running on the Power Processing Element (PPE). They find that the programming effort associated with efficient parallelization of c264 at fine granularity is highly non-trivial.