Association for Computing Machinery
Fully utilizing the power of modern heterogeneous systems requires judiciously dividing work across all of the available computational devices. Existing approaches for partitioning work require offline training and generate fixed partitions that fail to respond to fluctuations in device performance that occur at run time. The authors present a novel dynamic approach to work partitioning that requires no offline training and responds automatically to performance variability to provide consistently good performance. Using six diverse OpenCL applications, they demonstrate the effectiveness of their approach in scenarios both with and without run-time performance variability, as well as in more extreme scenarios in which one device is non-functional.