Domain Decomposition Method on GPU Cluster

Source: Hiroshima University

Favorite

Free registration required

Pallalel GPGPU computing for lattice QCD simulations has a bottleneck on the GPU to GPU data communication due to the lack of the direct data exchanging facility. In this work the authors investigate the performance of quark solver using the Restricted Additive Schwarz (RAS) preconditioner on a low cost GPU cluster. They expect that the RAS preconditioner with appropriate domain decomposition and task distribution reduces the communication bottleneck. The GPU cluster they constructed is composed of four PC boxes, two GPU cards are attached to each box, and they have eight GPU cards in total. The compute nodes are connected with rather slow but low cost Gigabit-Ethernet.
Format:PDF Size:173.26
Date:Nov 2010