MIT shows how AI cybersecurity excels by keeping humans in the loop

A new paper from MIT's Computer Science and Artificial Intelligence Laboratory and PatternEx shows that their AI system predicts cyber-attacks better than machine learning systems or human experts.

Image: MIT's CSAIL

Cybersecurity threats are among the most pressing concerns for businesses and institutions that need to protect information, but today's security systems are limited. Most security systems fall into two categories: human analyst or machine learning. Now, a new research paper from MIT shows that a combination of human experts with a machine learning system—in other words, supervised machine learning—provides better results than either human or machine alone.

"AI squared," which uses a system developed by PatternEx, is 10 times better at catching threats than machine learning alone, and reduces false positives by a factor of five. This, said MIT's researchers, is three times better than current benchmarks. The name, AI squared, comes from the combination of two ideas, said Ignacio Arnaldo, former CSAIL postdoc who is now chief data scientist at PatternEx: artificial intelligence and analyst's intuition.

"The domain has not used the potential of machine learning for its solutions," said Arnaldo. "The analysts haven't been involved in the loop. But we need an expert to analyze the data, to see how malicious the threats are. And we need experts to annotate data so we can incorporate the feedback into the system." Current machine learning systems, which use anomaly detection, produce a higher number of false positives.

SEE: How one AI security system combines humans and machine learning to detect cyberthreats

So how does it work? The system is fed data, and picks up suspicious activity using machine learning. Then, human analysts take a look and judge whether the suspicious events are real attacks. The feedback goes back into the system, so it's constantly improving.

"You can think about the system as a virtual analyst," said CSAIL research scientist Kalyan Veeramachaneni, who developed the system with Arnaldo. "It continuously generates new models that it can refine in as little as a few hours, meaning it can improve its detection rates significantly and rapidly."

Who will benefit most from this kind of system? According to Arnaldo, companies that already have security analysts that can transfer their knowledge into the machine learning system. "When they do an investigation, the next time around the system can predict threats," said Arnaldo. The system also benefits companies that don't have analysts—it can transfer information about patterns detected from another company to their own system.

SEE: Obama seeks $19B for cybersecurity in 2017, a 36% increase

Part of the reason this is a difficult task is because of labeling data. While many human labelers may be good at generic tasks, cybersecurity threats require experts in the field. "The average person on a crowdsourcing site like Amazon Mechanical Turk simply doesn't have the skillset to apply labels like 'DDoS' or 'exfiltration attacks,'" said Veeramachaneni.

MIT's researchers said that the system "can scale to billions of log lines per day, transforming the pieces of data on a minute-by-minute basis into types of behavior that are eventually deemed 'normal' or 'abnormal.'"

SEE: Information Security Policy template (Tech Pro Research)

"The more attacks the system detects, the more analyst feedback it receives, which, in turn, improves the accuracy of future predictions," Veeramachaneni said. "That human-machine interaction creates a beautiful, cascading effect."

Arnaldo said that what PatternEx has provided is access to real-world data. "Unless you have that access," he said, "you can't truly develop these methods. Startups like PatternEx are the perfect environment to develop this kind of technology."

Roman Yampolskiy, director of the Cybersecurity lab at the University of Louisville, agrees that the hybrid approach is "known to produce superior results in many domains (ex. chess, prediction markets, etc.) by combining the machine's ability to quickly crunch numbers and process big data sets with human intuition." MIT's system, "is a great example of a clever way to merge human and machine on a common task," said Yampolskiy.

"As machine learning algorithms improve, and the ability to learn from the human experts increases, we will likely reduce reliance on the analyst and achieve the same level of performance in a fully automated system," he said.

Also see

About Hope Reese

Hope Reese is a Staff Writer for TechRepublic. She covers the intersection of technology and society, examining the people and ideas that transform how we live today.

Editor's Picks

Free Newsletters, In your Inbox