Accelerating CUDA Graph Algorithms at Maximum Warp
Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffers heavily when the graph structure is highly irregular, as most real-world graphs tend to be. In this paper, the authors first observe that the poor performance is caused by work imbalance and is an artifact of a discrepancy between the GPU programming model and the underlying GPU architecture. They then propose a novel virtual warp-centric programming method that exposes the traits of underlying GPU architectures to users.