SIMD Re-Convergence at Thread Frontiers
Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA, OpenCL, and DirectX Compute. The impact of branch divergence can be quite different depending upon whether the program's control flow is structured or unstructured. In this paper, the authors show that unstructured control flow occurs frequently in applications and can lead to significant code expansion when executed using existing approaches for handling branch divergence. This paper proposes a new technique for automatically mapping arbitrary control flow onto SIMD processors that relies on a concept of a Thread Frontier, which is a bounded region of the program containing all threads that have branched away from the current warp.