Delft University of Technology
In this paper, the authors present a comprehensive performance comparison between CUDA and OpenCL. They have selected 16 benchmarks ranging from synthetic applications to real-world ones. They make an extensive analysis of the performance gaps taking into account programming models, optimization strategies, architectural details, and underlying compilers. Their results show that, for most applications, CUDA performs at most 30% better than OpenCL. They also show that this difference is due to unfair comparisons: in fact, OpenCL can achieve similar performance to CUDA under a fair comparison.