Direct N-Body Kernels for Multicore Platforms

The authors present an inter-architectural comparison of single- and double-precision direct n-body implementations on modern multi-core platforms, including those based on the Intel Nehalem and AMD Barcelona systems, the Sony-Toshiba-IBM PowerXCell/8i processor, and NVIDIA Tesla C870 and C1060 GPU systems. They compare their implementations across platforms on a variety of proxy measures, including performance, coding complexity, and energy efficiency. The goal of this paper is to understand the differences in the implementation techniques required for direct (particle-particle) n-body simulations to achieve good performance on a variety of modern multi-core CPU and accelerator desktop platforms.