Usable Assembly Language for GPUs: A Success Story
The NVIDIA compilers nvcc and ptxas leave the programmer with only very limited control over register allocation, register spills, instruction selection, and instruction scheduling. In theory a programmer can gain control by writing an entire kernel in van der Laan's cudasm assembly language, but this requires tedious, error-prone tracking of register assignments. This paper introduces a higher-level assembly language, qhasm-cudasm, that allows much faster programming while providing the same amount of control over the GPU. This language has been used successfully to build a 90000-machine-instruction kernel for a computation described in detail in the paper, the largest public cryptanalytic project in history.