Big data is still too difficult. Despite all the hype—and there has been lots and lots of hype—most enterprises still struggle to get value from their data. This led Dresner Advisory Services to conclude, "Despite an extended period of awareness building and hype, actual deployment of big data analytics is not broadly applicable to most organizations at the present time."
Some of this is a people problem. However persuasive the data, executives often prefer to ignore that data. But, a big part of the complexity in big data is about the software required to grok it all. Though Spark and other, newer systems have improved the trajectory, big data infrastructure remains way too hard, a point made astutely by Jesse Anderson.
This stuff is hard
People have long loomed as one of the biggest impediments to big data adoption. A 2015 Bain & Co. survey of senior IT executives found that 59% believed their companies lack the capabilities to make sense (and business) of their data. Speaking specifically of Hadoop, Gartner analyst Nick Heudecker suggested that "Thru 2018, 70% of Hadoop deployments will not meet cost savings & revenue generation objectives due to skills & integration challenges." Skills matter, in other words, and are in short supply.
Over time the skills gap will decrease, of course, but understanding the average Hadoop deployment, for example, is non-trivial, as Anderson noted. In his words, the complexity of big data comes down to two primary factors: "you need to know 10 to 30 different technologies, just to create a big data solution," and "distributed systems are just plain hard."
The question is why.
Anderson schematically represented the complexity of a typical mobile application versus a Hadoop-backed application, noting that the latter involves double the number of "boxes," or components. Expressed in plain English, however, "The 'Hello World' of a Hadoop solution is more complicated than other domains' intermediate to advanced setups."
Compounding the difficulty, Anderson said, is the need to understand the wide array of systems involved. You might need to know 10 technologies to build a big data application, for example, but that likely requires you to have some familiarity with another 20 technologies simply to know which one to use in a given situation. Otherwise, for example, how are you going to know to use MongoDB instead of Hbase? Or Cassandra? Or neo4j?
Add to this the complexity of running it all in a distributed system, and it's no wonder that the skills shortage for big data persists.
The easy way out
One way that enterprises are trying to minimize the complexity inherent in big data build-outs is by turning to the public cloud. According to a recent Databricks survey of Apache Spark users, deployment of Spark to the public cloud has ballooned 10% over the last year to 61% of total deployments overall. Instead of cumbersome, inflexible on-premises infrastructure, the cloud allows for flexibility and, hence, agility.
It does not, however, remove the complexity of the technologies involved. The same hard choices about this or that database or message broker remain.
Such choices, and the complexity therein, isn't going away anytime soon. Companies like Cloudera and Hortonworks have arisen to try to streamline those choices, tidying them up into stacks, but they still essentially provide tools that need to be understood in order to be useful. Amazon Web Services is going a step further with its Lambda service, which allows developers to focus on writing their application code while AWS takes care of all the underlying infrastructure.
But the next step is to pre-fab the application for the end user entirely, which is what former Wall Street analyst Peter Goldmacher dubbed a much bigger opportunity that selling infrastructure components. In his words, one major category of "winners [is] the Apps and Analytics vendors that abstract the complexity of working with very complicated underlying technologies into a user friendly front end. The addressable audience of business users is exponentially larger than the market for programmers working on core technology."
This is where the market needs to get to, and fast. We're nowhere near done. For every Uber that is able to master all the underlying big data technologies to up-end industries there are hundreds of traditional companies that simply want to reinvent themselves and need someone to make their data more actionable. We need this category of vendor to emerge. Now.
- Why AWS Lambda could be the worst thing to happen to open source (TechRepublic)
- Tim O'Reilly on open data: Cheap may be open enough (TechRepublic)
- Report: The top tech trends impacting the enterprise (TechRepublic)
- Why microservices are about to have their "cloud" moment (TechRepublic)
- IBM launches cloud-based development environment for Apache Spark(TechRepublic)
Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.