Data Center Switch Architecture in the Age of Merchant Silicon
Source: University of California
For massively parallel workloads, the principal bottleneck is often not the performance of individual nodes but the rate at which these nodes can exchange data over the network. Many Data Center Network (DCN) applications demonstrate little communication locality, meaning that the communication substrate must support high aggregate bisection bandwidth for worst-case communication patterns. Unfortunately, modern DCN architectures typically do not scale beyond a certain amount of bisection bandwidth and become prohibitively expensive well in advance of reaching their maximum capacity in some cases oversubscribed by a factor of 240. An instance of their architecture, a 3,456-port 10GbE switch, built using a fat-tree topology internally and merchant silicon whenever possible. Their design provides 34.56 Tb/s of bisection bandwidth when fully deployed. They showed how an Ethernet Extension Protocol (EEP) could further reduce the cost, power consumption, and cabling complexity of such a switch by aggregating multiple lower speed links onto a single higher-speed link. Packaging and cabling is only one aspect of constructing a large fat-tree-based Ethernet switch. There are also other issues which must be addressed such as managing the forwarding tables of individual switching elements, handling ARP and other broadcast traffic in a large Layer 2 domain, efficient multicast, fault tolerance, and routing TCP flows through the switch fabric.