Domain Decomposition Method on GPU Cluster
Pallalel GPGPU computing for lattice QCD simulations has a bottleneck on the GPU to GPU data communication due to the lack of the direct data exchanging facility. In this work the authors investigate the performance of quark solver using the Restricted Additive Schwarz (RAS) preconditioner on a low cost GPU cluster. They expect that the RAS preconditioner with appropriate domain decomposition and task distribution reduces the communication bottleneck. The GPU cluster they constructed is composed of four PC boxes, two GPU cards are attached to each box, and they have eight GPU cards in total. The compute nodes are connected with rather slow but low cost Gigabit-Ethernet.