Date Added: Oct 2009
With changing needs, customers are looking for improved application performance. This has motivated application developers to investigating the use of hybrid architectures to improve application performance of High Performance Computing (HPC). Hybrid architectures combine conventional, general-purpose CPUs with any of a variety of more specialized processors such as GPUs, FPGAs, and Cells to help improve application performance. The complex nature of hybrid architectures make understanding and reasoning about application performance difficult without appropriate tool support. This paper presents a profiling library that is capable of tracing intra-Cell DMA events as well as inter Cell message passing. The implementation of profiling library shows efficient resource consumption of only 12 KiB of SPE local store memory and also has an overhead of less than 3.2 ?s per profile call. The paper describes a methodology for profiling parallel applications executing on the IBM PowerXCell 8i, which is more commonly called cell processor.