Warp-Level Divergence in GPUs: Characterization, Impact, and Mitigation

Download Now
Provided by: North Carolina State University
Topic: Hardware
Format: PDF
High throughput architectures rely on high Thread-Level Parallelism (TLP) to hide execution latencies. In state-of-art Graphics Processing Units (GPUs), threads are organized in a grid of Thread Blocks (TBs) and each TB contains tens to hundreds of threads. With a TB-level resource management scheme, all the resource required by a TB is allocated/released when it is dispatched to/finished in a Streaming Multiprocessor (SM). In this paper, the authors highlight that such TB-level resource management can severely affect the TLP that may be achieved in the hardware. First, different warps in a TB may finish at different times, which they refer to as 'Warp-level divergence'.
Download Now

Find By Topic