Tackle cybercrime with data science using this five-point framework

The NIST's framework for reducing cyber-related risks is a good starting point when strengthening your organization's security with data science.

Image: iStock/stefanocar75

If your organization wants to incorporate data science into its cybersecurity program, you may not know where to start. Fortunately, thanks to an agency of the US Department of Commerce, you don't have to approach this with a blank slate.

When considering a data science solution to cybersecurity, you might want to start with the National Institute of Standards and Technology's (NIST) Framework for Improving Critical Infrastructure Cybersecurity.

Inside the NIST framework

The NIST's framework is easy to understand, and it's detailed and prescriptive, so you can quickly conceptualize how this might materialize in your organization. It also opens the door for a lot of data science opportunities that we can explore.

At its core, the framework comprises five basic functions: Identify, Protect, Detect, Respond, and Recover.

Identify your assets

The Identify function is largely concerned with knowing what assets need to be protected and the risk analysis required to assign the highest priorities to your highest-risk assets.

Although asset identification should be a somewhat perfunctory operation within your company, what's more interesting from a data science perspective is the risk management aspect. You'll need to risk rank your assets based on probability of attack (POA) and consequence of breach (COB). It may be wise to task your data science team to develop a quantitative model to help in this endeavor.

Protect your information

The Protect function is all about never letting them through the front door. Web Application Firewalls (WAFs) have become very effective at deflecting cyberattacks. So, that might be fine for your web applications, but cybercriminals are likely more interested in your data, and a WAF will do little good if the bad guys find a backdoor to your HR database.

Data science has a more direct application in the Protect function. Pattern recognition and classification algorithms are where your data scientists should focus. Assets with a high POA will give your data scientists lots of data to profile. You can also use internal cyberexperts to simulate attacks, so your data scientists can build reliable signatures to block.

Detect the intruders

The Detect function is about recognizing the intruder once they make it past your front door. This is where I spent most of my time with my recent client. They were pretty comfortable with their protection schemes; however, they were very concerned if someone actually made it past their defenses.

In my opinion, the best data science approach within the Detect function is an expert system. Classification routines, neural networks, and anything else that needs to be trained require a reasonable set of defects. Since a defect in this case means an actual breach, it's likely you don't have any, and you definitely don't want to invite one! So, a rule-based expert system would probably be the best place to start.

Respond to threats

Once you know someone's in your house, you need to figure out what to do with him or her, which is where the Respond function comes in -- it forces you to plan in advance what your strategy would be if a cybercriminal made it through your defenses.

I remember having conversations with some of the DBAs in the older, legacy systems. A common comment was, "Even if they got access to our data, there's no way they could get out with our data."

Most of the Response function is action-oriented with little obvious data science application; however, there is an analytical component to the Response function that's worthy of your data scientists' attention. Once the event is over -- with an outcome hopefully in your favor -- the breach needs to be analyzed to understand why the Protect function failed. Although not ideal circumstances, this information is valuable for your data scientists to compare against the Protect data profiles.

Recover your business

The Recover function addresses how you bounce back from a successful cyberattack.

Imagine if a cyberattacker successfully breached your data warehouse and replaced all the foreign keys with garbage data. As you know, that would render your data warehouse useless, which is why it's a common objective for cybercriminals.

Although there's no obvious analytical application to this function, your data scientists can still come in handy. Remember, data science is more than just analytics. Resilience and business continuity are very familiar playgrounds for data scientists, so make sure they're consulted when considering how to recover from a cyberattack. I'm sure they'll have good ideas that had not been considered.


Download the NIST's Framework for Improving Critical Infrastructure Cybersecurity (PDF), and discuss it with your data science team using the ideas in this article as a starting point. It's best to get a grip on your information before the bad guys do.

Also see