Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems
Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. However, when using multiple GPUs concurrently, the conventional data parallel GPU programming paradigms, e.g., CUDA, cannot satisfactorily address certain issues, such as load balancing, GPU resource utilization, overlapping fine-grained computation with communication, etc. In this paper, the authors present a fine-grained task-based execution framework for multi-GPU systems. By scheduling finer-grained tasks than what is supported in the conventional CUDA programming method among multiple GPUs, and allowing concurrent task execution on a single GPU, their framework provides means for solving the above issues and efficiently utilizing multi-GPU systems.