NVIDIA has unveiled the Tesla V100-powered HGX-2 appliance, the follow-up to last year’s HGX-1 cloud server system intended for GPU compute workloads. The HGX-2, unveiled in a Tuesday press release, is meant for developers working with large data sets, such as those associated with image recognition and language translation.

In terms of raw computational ability, the HGX-2 is powered by two baseboards with eight V100 GPUs, and six NVSwitches, the release noted. NVSwitches allow the GPUs to communicate with each other via NVLink, a protocol devised by NVidia to overcome the limitations of PCI Express. While that technology relies on a central hub to negotiate communication, NVLink allows for multiple links per device, and uses mesh networking. Because of this design, a GPU has full-bandwidth access to any other GPU, including a GPU on a different baseboard, according to NVIDIA.

The HGX-2 can calculate at 2 petaflops for tensor operations, and has 512 GB RAM, and features a bisection bandwidth of 2400 GB/s. For comparison, the 8-GPU HGX-1 is only capable of calculating at 1 petaflop, has 256GB RAM, and has a bisection bandwidth of 300 GB/s. On the original model, only 4 GPUs are fully connected through NVLink, which is responsible for the slower speed.

SEE: IT hardware procurement policy (Tech Pro Research)

NVIDIA claims in the release that the HGX-2 “can replace 300 dual CPU server nodes on deep learning training.” Comparisons with two HGX-1 systems on 4x100Gb Infiniband interconnects with a single HGX-2 system show a 2x speedup for MILC (a simulation that studies quantum chromodynamics), and a 2.4x speedup for ECWMF (a global weather prediction model) benchmarks, and a 2.7x speedup for Transformer with Mixture of Experts (MoE) benchmark.

According to the release, “HGX-2-serves as a ‘building block’ for manufacturers to create some of the most advanced systems for HPC and AI.” Additioanlly, it was noted that the appliance “achieved record AI training speeds of 15,500 images per second on the ResNet-50 training benchmark.”

Lenovo, QCT, Supermicro and Wiwynn have committed to shipping HGX-2-based products this year, while Foxconn, Inventec, Quanta, and Wistron are working on HGX-2-based systems for cloud data centers, according to a report from our sister site ZDNet.

NVIDIA has not disclosed pricing information for the technology, though given the enterprise market and use cases it is intended for, it may be out of reach for cryptocurrency miners.

The big takeaways for tech leaders:

  • The new NVIDIA HGX-2 can calculate at 2 petaflops for tensor operations, has 512 GB RAM, and features a bisection bandwidth of 2400 GB/s.
  • This speedup of the NVIDIA HGX-2 is enabled through the use of NVLink, a protocol devised by NVIDIA to overcome the limitations of PCI Express.