Can Software Reliability Outperform Hardware Reliability on High Performance Interconnects? ACase Study With MPI Over InfiniBand
An important part of modern supercomputing platforms is the network interconnect. As the number of computing nodes in clusters have increased, the role of the interconnect has become more important. Modern interconnects, such as InfiniBand, Quadrics, and Myrinet have become popular due to their low latency and increased performance over traditional Ethernet. As these interconnects become more widely used and clusters continue to scale, design choices such as where data reliability should be provided are an important issue.