Hadoop is big but won't go mainstream until it resolves its glaring security issues.
Hadoop may be hitting its stride within the enterprise, but it's hard to take the big data poster child seriously when so few enterprises care about its security. A year ago, Gartner analyst Merv Adrian lamented the "nearly non-existent response" to Hadoop's "security issue," calling it "shocking."
One year and a lot of adoption later, it's still the case that no one cares about Hadoop security. The question is why.
With Hadoop adoption finally taking off, shouldn't big data security interest be rising in tandem?
Hadoop goes big
Though enterprise CIOs have been talking up big data for years, relatively few have actually rolled out serious big data initiatives.
As a new Deutsche Bank CIO survey highlights, once-reticent CIOs are beginning to embrace Hadoop:
"CIOs are now broadly comfortable with the technology and see it as a significant part of the future data architecture. We would expect significant $ commitments in [fiscal year 2015]."
While still a small population (Gartner estimates that just 1,000 enterprises have Hadoop running in production), momentum is gathering. According to Gartner, while just 15% of enterprise CIOs surveyed expect Hadoop to "play a significant role in most companies analytical infrastructure" within the next two years, that number jumps to 40% if we extend the timeframe out four years.
This shift shows in Gartner analysis of big data plans. Between 2013 and 2014, enterprises significantly increased their production deployments of big data, which often features Hadoop (Figure A).
Production deployments of big data.
All of this is great for Hortonworks, Cloudera, and other companies selling software and support services for Hadoop. But those numbers start to look somewhat scary when married with CIO interest in Hadoop security.
Big data, little security
Early in 2014, even as Hadoop's hype hit overdrive, Adrian showcased a distinct lack of concern about its security. He polled webinar participants on the biggest barriers to Hadoop adoption. The "shocking" answers, as Buzzfeed might say, may surprise you (Figure B).
Biggest barriers to Hadoop adoption.
While the biggest chunk of respondents cite Hadoop's "undefined value proposition," a mere 2% called out its security issues. This led Adrian to exclaim:
"[T]he nearly non-existent response to the security issue is shocking. Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there numerous concerns. These include the use of external unveiled data and of data in file systems that lack any protection, and the separation of Hadoop initiatives in most organizations from IT governance. Add to that the kinds of use cases Hadoop is being pointed at: sensitive health care information personal data in retail systems; telephone usage; social media connection and sentiment analytics - all of them give us pause."
It won't be any consolation to Adrian, but -- one year later -- things haven't gotten any better. As Adrian's Gartner colleague Nick Heudecker tweeted this week, "Less than 5% of Hadoop inquiries covered by the Info Mgmt team in 2014 discussed security. This has to change in 2015."
Not everyone agrees, of course.
Cloudera's Justin Kestelyn, for example, argues that "Show me someone who doesn't care about Hadoop security, and I'll show you someone who doesn't understand what Hadoop can do." Kestelyn is probably correct, but he may simply be underlining how far Hadoop has to go.
Still fiddling with the knobs
It's important to remember that, for all its media hype, Hadoop has generally been confined to isolated clusters or data silos. Most enterprises, if they run it at all, are still running it in limited production or pilots. As such, security isn't an issue. Indeed, it won't be an issue until it truly goes mainstream and becomes an essential element of enterprise data infrastructure.
Indeed, the lack of concern over Hadoop security is the clearest indication that it hasn't yet become critical infrastructure, even though all trends point that direction.
Another way to look at Hadoop's (in)security is through the lens of innovation. The entire Hadoop ecosystem has been innovating at breakneck speed, with security a nice-to-have but not yet a must-have feature.
However, it will come. The Hadoop train has left the station, and it's clearly heading toward mainstream adoption. As it does, the Hadoop community will prioritize security, because customers will demand it. We're not yet there, but it's coming.