TupleQ: Fully-Asynchronous and Zero-Copy MPI Over InfiniBand
The Message Passing Interface (MPI) is the defacto standard for parallel programming. As system scales increase, application writers often try to increase the overlap of communication and computation. Unfortunately, even on offloaded hardware such as InfiniBand, performance is not improved since the underlying protocols within MPI implementation require control messages that prevent overlap without expensive threads. In this work the authors propose a fully-asynchronous and zerocopy design to allow full overlap of communication and computation.