A Software Based Approach for Providing Network Fault Tolerance in Clusters With UDAPL Interface: MPI Level Design and Performance Evaluation
In the arena of cluster computing, MPI has emerged as the de facto standard for writing parallel applications. At the same time, introduction of high speed RDMA-enabled interconnects like InfiniBand, Myrinet, Quadrics, RDMA-enabled Ethernet has escalated the trends in cluster computing. Network APIs like uDAPL (user Direct Access Provider Library) are being proposed to provide a network independent interface to different RDMA-enabled interconnects. Clusters with combination(s) of these interconnects are being deployed to leverage their unique features, and network failover in wake of transmission errors.