Big Data

MIT's automated machine learning works 100x faster than human data scientists

Auto Tune Models (ATM) is an automated system that beats human-designed machine learning solutions for 30% of datasets tested.

Building a slide deck, pitch, or presentation? Here are the big takeaways:
  • An automated machine learning platform called Auto Tune Models (ATM) from MIT and Michigan State University uses cloud-based, on-demand computing to speed data analysis. -MIT and Michigan State University, 2017
  • ATM was able to deliver a solution better than the one humans had come up with 30% of the time, and could do this 100x faster. -MIT and Michigan State University, 2017

A new automated machine learning system can analyze data and come up with a solution 100x faster than humans, according to a new paper from MIT and Michigan State University. This could potentially help businesses take advantage of machine learning's capabilities in a faster, easier way, while also filling data science talent gaps.

The system also potentially marks a tipping point in machine learning adoption in the enterprise, which is expected to double in 2018, as TechRepublic's sister site ZDNet reported.

When seeking a solution to a problem, data scientists must wade through huge datasets, and choose the modeling technique they believe will work best. The issue is, there are hundreds of techniques to choose from, including neural networks and support vector machines, and choosing the best one could potentially mean the difference between millions of dollars in ad revenue or none, or catching a flaw in a medical device or not.

Researchers from MIT and Michigan State University recently presented a paper called Auto Tune Models (ATM) at the IEEE International Conference on Big Data, which demonstrated how a new automated system can select a modeling technique even better than humans can.

SEE: Research: Companies lack skills to implement and support AI and machine learning (Tech Pro Research)

ATM uses cloud-based, on-demand computing to perform a high-throughput search, and find the best possible modeling technique for a given problem, according to an article from MIT News. The system also adjusts the model's hyperparameters, or the values that specify how the model will be trained, to gain the best results.

Researchers tested the system against humans via the collaborative crowdsourcing platform open-ml.org, on which data scientists work together to solve problems. ATM analyzed 371 datasets from the platform. The system was able to come up with a solution better than the one humans had developed 30% of the time, researchers found.

ATM also worked much more quickly than the humans could: It took human open-ml users an average of 200 days to deliver a solution, while it took ATM less than a day to create a better-performing model.

ATM can augment the work of data scientists, and offer more peace of mind that they are selecting the right model, Arun Ross, professor in the Computer Science and Engineering department at Michigan State University, and a senior author on the paper, told MIT News.

"There are so many options," Ross told MIT News. "If a data scientist chose support vector machines as a modeling technique, the question of whether she should have chosen a neural network to get better accuracy instead is always lingering in her mind."

ATM searches through techniques by testing thousands of models in parallel, evaluating each, and allocating more computational resources to the ones that best fit the problem. Then, the system displays its results as a distribution, so researchers can compare different methods. Therefore, it is not trying to automate the human out of the process, Ross told MIT News.

Streamlining the model selection process with automation allows data scientists to work on more complex parts of the problem, researchers noted. "We hope that our system will free up

experts to spend more time on data understanding, problem formulation and feature

engineering," Kalyan Veeramachaneni, principal research scientist at MIT's Laboratory for Information & Decision Systems and co-author of the paper, told MIT News.

ATM is currently available for enterprises as an open source platform. It can run on a single machine, local computing clusters, or on-demand clusters in the cloud, and can work with

multiple datasets and multiple users at the same time, MIT noted. "A small- to medium-sized data science team can set up and start producing models with just a few steps," Veeramachaneni told MIT News.

istock-871398570.jpg
Image: iStockphoto/GeorgeRudy

Also see

Special report: How to implement AI and machine learning (free PDF) (TechRepublic)

The great data science hope: Machine learning can cure your terrible data hygiene (ZDNet)

Machine learning: The smart person's guide (TechRepublic)

How to build a data science team (ZDNet)

5 tips to overcome machine learning adoption barriers in the enterprise (TechRepublic)

About Alison DeNisco Rayome

Alison DeNisco Rayome is a Staff Writer for TechRepublic. She covers CXO, cybersecurity, and the convergence of tech and the workplace.

Editor's Picks

Free Newsletters, In your Inbox