Date Added: Mar 2011
High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This paper presents one simple technique, GPU Run-Time Code Generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique.