Architecture-Aware Mapping and Optimization on a 1600-Core GPU
The Graphics Processing Unit (GPU) continues to make in-roads as a computational accelerator for HighPerformance Computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task; it is a multi-dimensional problem that requires deep technical knowledge of GPU architecture. Although substantial literature exists on how to map and optimize GPU performance on the more mature NVIDIA CUDA architecture, the converse is true for OpenCL on an AMD GPU, such as the 1600-core AMD Radeon HD 5870 GPU. Consequently, the authors present and evaluate architecture-aware mapping and optimizations for the AMD GPU.