Security is often an afterthought in most rapidly maturing applications, and Big Data is no exception.
IT security has gone from nuanced technical discussions in IT leadership circles to dinner table conversation topics, as each week brings new front page news stories about security breaches and high-profile hacker attacks.
It's easy to get caught up in the fear and loathing of these discussions, and long-term readers may recall that I generally take a more sanguine approach to data security. I suggest regarding data security like insurance: determine your risk exposure and ideal coverage, determine how much you want to spend, then adjust the two until they meet at a medium you're comfortable with.
Just as we'd all love billion dollar life insurance policies and "gold-plated" medical coverage, but balk at the costs of such coverage, we'd all love the latest and greatest IT security and a dedicated support staff, but usually lack the financial means to afford it.
For the most part, organizations have taken this approach, and the days of open access to transactional systems and default administrator passwords are largely behind us. However, Big Data presents some new security challenges that are worth investigating in a considered manner.
The great benefit of Big Data is that it consolidates massive data sets from diverse sources, performing complex analytics to solve a business problem. This is certainly a noble goal, but requires that data be exported from stable and secure systems and loaded into a Big Data tool that may be relatively immature and newly installed.
Security is often an afterthought in most rapidly maturing applications, and Big Data is no exception. Combine this with most Big Data projects having tight timelines and high expectations, and you have data that were stored under lock and key being thrown into systems with only the most rudimentary security.
Furthermore, as the name implies, Big Data will often include massive dumps of diverse data, such that data theft from one system results in a treasure trove that might have otherwise required far more work. While state-sponsored attacks from a competing nation make for great press, far more likely is the lost laptop with a copy of a Big Data dataset, or the disgruntled employee who can dump a massive amount of data that he or she would normally not have access to.
While Big Data demands huge datasets and speedy analysis, just as you can find an adequate compromise on insurance, so, too, can you find an appropriate balance of security and Big Data. Take the time to analyze the criticality of the dataset that's being analyzed and gauge the level of sensitivity of the data. You need not spend massive amounts of time and treasure securing public market data, but if your dataset includes confidential sales, customer, or employee information, security should be applied appropriately and presumably match that of the source systems.
Also, remember that a key element in any type of security is the human element. If you have neither the time nor inclination to implement extensive security, ensure that staff with access to the data can be trusted, and that they understand the nature of the data they're dealing with. Where consultants are involved, ask to see their data security policies, and ensure they're appropriate for the type of data the consultants will have access to.
Finally, while few want to pause a highly visible Big Data effort, remind peers that security is all a matter of prudent risk, as well as protection appropriate to that level of risk. I might jump into the ocean for a quick swim with little more than shorts and a smile, but if I go scuba diving you had better believe I'm double-checking all my equipment and carrying backups of critical gear.