Design and Evaluation of Generalized Collective Communication Primitives With Overlap Using ConnectX-2 Offload Engine
Collective communication operations provided by the Message Passing Interface (MPI) are heavily used by scientific applications at large scale. The current MPI standard, MPI-2.2, only defines blocking collective communication calls and it is expected that MPI-3 will allow for non-blocking collective communication. While it is possible to allow simultaneous computation and communication through thread-based designs, resource sharing across the threads is always a concern. The newly introduced ConnectX-2 InfiniBand adapter from Mellanox features an offload mechanism that enables the Network Interface Card (NIC) to perform a series of communication and reduction operations without the involvement of the host processor.