Assembling strong data sets and developing domain expertise are more important than choosing an algorithm.
When artificial intelligence is the right tool to improve security, the most important step is not choosing the right algorithm for the job. Security teams and data scientists should start with strong data sets and a clear understanding of the business goal.
Zulfikar Ramzan, chief technology officer of RSA, moderated a panel on cryptography at RSA 2020 and shared this advice. Ramzan leads technology strategy for the organization and holds more than 50 patents. He said that AI and machine learning have been part of the security world for more than a decade. The difference now is that people are talking about the tool instead of the problem.
"The focus became the mechanism for solving the problem instead of the actual problem," he said. "AI is a how rather than a why."
Ramzan also said many AI systems were not meant to work in adversarial environments where the rules change all the time.
"AI is not built with the assumption of a sentient adversary," he said.
SEE: Special report: Managing AI and ML in the enterprise (free PDF)
There are a lot of subtle challenges in making the algorithm work in the real world for a customer, including choosing functionality over elegance.
"It's not always pretty but it works better, and that's what matters to the customer," he said.
Get the sequence right
Here is Ramzan's advice on how to operationalize AI.
- Identify quality data sets for training and validation
- Establish domain expertise
- Select the features for the analysis
- Choose an algorithm
Before selecting an algorithm to analyze data, companies have to know what they want to learn from the data. Ramzan used the example of determining whether a website was benign or malicious. One of the simplest ways to determine that is to look up the registration date of the domain name. A newly registered domain raises more suspicions than one that has been around awhile.
"That one bit of information alone gives you a really nice advantage in being able to determine if the website is good or bad," he said. "No data set is going to tell you to look at that."
This is where domain expertise comes in. Machine learning should be applied
"Once you've got those two things right, the algorithm you choose to make sense of the data is almost immaterial," he said. Most of the good algorithms will perform within 1 or 2% of each other on a given day."
If two algorithms result in significantly different outcomes, that is probably a sign that something is wrong earlier in the process, he said.
Ramzan said most people get the sequence backwards.
"They spend all their time on the algorithms and less time on features and nobody spends any time on the data," he said.
He also said AI is not the right choice for every security problem.
"Don't use AI to solve a problem you can solve with a signature, use AI for the harder part of the problem space," he said. "That is how to approach any system building, it's not just finding the fanciest tool, find the tool you need."
- How to become a cybersecurity pro: A cheat sheet (TechRepublic)
- Mastermind con man behind Catch Me If You Can talks cybersecurity (TechRepublic download)
- Windows 10 security: A guide for business leaders (TechRepublic Premium)
- Online security 101: Tips for protecting your privacy from hackers and spies (ZDNet) All the VPN terms you need to know (CNET)
- Cybersecurity and cyberwar: More must-read coverage (TechRepublic on Flipboard)