Big Data

Microsoft partners with Cray to run deep learning algorithms on supercomputers

Microsoft and Cray recently unveiled a collaborative work scaling the Microsoft Cognitive Toolkit to run on a Cray XC50 supercomputer, getting results faster then before.

Image: Cray

The latest collaboration between Microsoft and Cray could dramatically lessen the time it takes data scientists to train and run data models that play into deep learning technologies. On Wednesday, at the 2016 Neural Information Processing Systems (NIPS) Conference in Spain, the two companies showed off their latest supercomputing work in broadening the usefulness and scale of deep learning algorithms.

According to a press release announcing the combined work, the premise for the partnership came from the idea that conventional systems and architectures in deep learning take too long to train, and thus limit what can be accomplished with them.

To address the challenge, Microsoft and Cray teamed up, alongside the Swiss National Supercomputing Centre (CSCS), to explore how they could use supercomputing to expand deep learning. The result is that the Microsoft Cognitive Toolkit was scaled to work on a Cray XC50 supercomputer, nicknamed "Piz Daint," that resides at the at CSCS.

SEE: IBM, NVIDIA partner for 'fastest deep learning enterprise solution' in the world

The end result is that the time it takes to get actual results from deep learning algorithms is shortened dramatically. According to the release, the combined efforts can get results to data scientists in minutes or hours instead of weeks or months.

"With the introduction of supercomputing architectures and technologies to deep learning frameworks, customers now have the ability to solve a whole new class of problems, such as moving from image recognition to video recognition, and from simple speech recognition to natural language processing with context," the release stated.

The reason this seems to work is because deep learning problems share similarities, on an algorithmic level, with some of the applications that are typically reserved for supercomputers like the Cray XC50. And, because of the increased compute resources available to the deep learning models, they are able to be trained at a much more rapid pace.

"What is most exciting is that our researchers and scientists will now be able to use our existing Cray XC supercomputer to take on a new class of deep learning problems that were previously infeasible," Thomas C. Schulthess, director of the CSCS, said in the press release.

The Cray supercomputer on which the Microsoft Cognitive Toolkit was scaled to run had 1,000 NVIDIA Tesla P100 GPU accelerators. In addition to speeding up traditional deep learning processes, the collaboration also opens up options for more complex and in-depth deep learning workloads in the future, the release said.

In order to further support deep learning in supercomputing, Cray is supporting customers of its Cray XC series with deep learning toolkits, like the Microsoft Cognitive Toolkit that was used in this collaboration.

"We are working to unlock possibilities around new approaches and model sizes, turning the dreams and theories of scientists into something real that they can explore." Mark S. Staveley, Cray's director of deep learning and machine learning, said in the press release.

In August, similar work began taking place at nonprofit artificial intelligence research company OpenAI. The company, backed by Elon Musk, became the first customer of the Nvidia DGX-1, which is billed as the "world's first deep learning supercomputer in a box," and could help accelerate Open AI's research.

The 3 big takeaways for TechRepublic readers

  1. Microsoft and Cray have partnered to run the Microsoft Cognitive Toolkit on a Cray XC50 supercomputer, to speed up training and running the data models that impact deep learning technologies.
  2. The Cray computer resides at the Swiss National Supercomputing Centre (CSCS) and runs 1,000 NVIDIA Tesla P100 GPU accelerators.
  3. The use of deep learning algorithms on supercomputers can speed up traditional deep learning workloads, and opens up possibilities for more complex workloads in the future.

About Conner Forrest

Conner Forrest is a Senior Editor for TechRepublic. He covers enterprise technology and is interested in the convergence of tech and culture.

Editor's Picks

Free Newsletters, In your Inbox