Building a slide deck, pitch, or presentation? Here are the big takeaways:

  • Deep learning developers will have free access to NVIDIA’s architecture and ARM’s IoT chip expertise under the partnership and the open source design allows for features to be added regularly, including contributions from the research community.
  • The NVIDIA DGX-2 is capable of delivering 2 petaflops of computational power and boasts the deep learning processing power of 300 servers in a single server.

NVIDIA has announced a partnership with Internet of Things (IoT) chip designer ARM, aimed at advancing the acceleration of inferencing by making it simple for IoT chip companies to integrate artificial intelligence (AI) into their designs.

During his keynote at NVIDIA GTC in San Jose on Tuesday, company CEO and founder Jensen Huang explained that the partnership will see the open source NVIDIA Deep Learning Accelerator (NVDLA) architecture integrated into Arm’s Project Trillium for machine learning.

NVDLA is based on NVIDIA Xavier, touted by the GPU giant as being a “powerful autonomous machine system on a chip.” According to Huang, this will provide a free, open architecture to promote a standard way to design deep learning inference.

“Inferencing will become a core capability of every IoT device in the future,” NVIDIA vice president and general manager of Autonomous Machines, Deepu Talla, said during a press briefing.

SEE: IT leader’s guide to deep learning (Tech Pro Research)

“Our partnership with ARM will help drive this wave of adoption by making it easy for hundreds of chip companies to incorporate deep learning technology,” Talla added.

ARM, purchased by Japanese conglomerate Softbank in 2016 for £24.3 billion, has a vision of connecting one trillion IoT devices, expecting the existence of that many devices by 2035.

Also announced at GTC on Tuesday was an 8x performance boost–compared to the previous generation–coming to the company’s latest deep learning compute platform.

The advancements, already adopted by major cloud vendors, include a two-fold memory boost to the NVIDIA Tesla V100 data center GPU; a new GPU interconnect fabric, NVIDIA NVSwitch, which enables up to 16 Tesla V100 GPUs to simultaneously communicate at a speed of 2.4 terabytes per second; and an updated software stack.

The Tesla V100 products will now boast 32GB of memory each, effective immediately, with Cray, HPE, IBM, Lenovo, Supermicro, and Tyan announcing the rollout of the V100 32GB in the second quarter, and Oracle Cloud infrastructure expected to offer V100 32GB in the cloud in the second half of 2018.

“We are all in on deep learning and this is the result … we’re just picking up steam,” Huang said.

Huang also detailed a breakthrough in deep learning computing, with NVIDIA DGX-2.

A single server, the DGX-2 is capable of delivering 2 petaflops of computational power and boasts the deep learning processing power of 300 servers. It also received a 32GB upgrade.

“The extraordinary advances of deep learning only hint at what is still to come,” Huang said. “Many of these advances stand on NVIDIA’s deep learning platform, which has quickly become the world’s standard.”

The CEO said his company is enhancing its deep learning platform’s performance at a pace exceeding Moore’s law, enabling “breakthroughs that will help revolutionize healthcare, transportation, science exploration, and countless other areas.”

With Huang noting that GPU acceleration for deep learning inference is gaining traction, NVIDIA also unveiled a series of new technologies and partnerships that expand its inference capabilities for hyperscale data centers, offering support for capabilities such as speech recognition, natural language processing, recommender systems, and image recognition.

The announcement includes the integration of TensorRT–a high-performance deep learning inference optimiser and runtime that delivers low latency, high-throughput inference for deep learning applications–into Google’s TensorFlow 1.7 framework.

This dramatically improves inferencing in TensorFlow, as previously with TensorFlow 1.6, a single V100 could process about 300 images per second; moving to 1.7 will allow for over 2,600 images to be processed per second.

TensorRT 4, the latest generation Huang unveiled on Tuesday, is said to deliver up to 190X faster deep learning inference compared with CPUs for common applications such as computer vision, neural machine translation, automatic speech recognition, and speech synthesis.

NVIDIA also announced that its Kaldi speech recognition framework has been optimized for GPUs, allowing for “more useful” virtual assistants for consumers and lower deployment costs for data center operators.

Disclaimer: Asha McLean travelled to GTC as a guest of NVIDIA