Highly Scalable Barriers for Future High-Performance Computing Clusters

In this paper, the authors show the suitability of their approach by analyzing the performance of barriers, a very common synchronization primitive in parallel programs. Experiments in a real cluster prototype show that their approach allows synchronization among 1024 cores spread over 64 nodes in less than 15us, several times faster than other highly optimized barriers. They show the feasibility of this approach by executing a shared-memory implementation of FFT. Finally, note that this barrier can also be leveraged by MPI applications running on their shared memory architecture for clusters.

Provided by: Heidelberg University Topic: Data Centers Date Added: Aug 2011 Format: PDF

Find By Topic