Provided by: Institute of Electrical & Electronic Engineers
Date Added: Feb 2013
In this paper, the authors report on the development of a performance-portable OpenCL implementation of Sandia's miniMD benchmark. They show that the performance bottlenecks of the force compute kernel are the same across several architectures, and that the optimizations that they apply to the original scalar code improve performance by more than 2x across a wide range of hardware types from different vendors: CPUs and integrated GPUs from AMD and Intel; and discrete GPUs from AMD and Nvidia. Their complete OpenCL implementation is 1.7x faster than the original miniMD code running on the same hardware, and at most 2x slower than \"Native\" implementations highly optimized for particular platforms.