It's not surprising that most enterprises still don't know how to extract value from their data: vendors keep telling them half the story. Which half, of course, largely depends on whether it's an incumbent vendor trying to repurpose yesterday's data infrastructure for today's data sources or an upstart vendor trying to shoehorn today's data infrastructure for yesterday's data sources.
The companies that effectively tackle big data are those that aren't blinded by either camp and instead learn how to blend different data infrastructure to tackle a variety of data types and sources.
Our new/old data challenges
The good news is that an ever-increasing percentage of companies are advancing with big data projects, as Gartner survey data shows (Figure A):
More companies are advancing with big data projects.
The bad news, however, is that most companies remain stymied by how to derive value from such projects.
Some of the fault lies with overenthusiastic but underprepared enterprises that hear the big data hype and rush to kickstart projects without having a clear plan.
But some of the fault also lies with vendors for initiating the hype.
It's not that vendors lie. Rather, channeling our inner Emily Dickinson, we "tell all the truth, but tell it slant." All vendors are complicit in this slant.
That slant would have you believe, for example, that since unstructured data now accounts for as much as 80% of all enterprise data, and unstructured data is growing at twice the rate of structured data, you should dump your data warehouse and go "all in" on Hadoop.
While it's certainly good practice to embrace Hadoop and its kissing cousin NoSQL, it's also the case that the vast majority of your data still sits in data warehouses, as 451 Research has shown (Figure B):
The vast majority of your data still sits in data warehouse.
As for the data you're currently analyzing, well, most of it remains structured transactional data, as Gartner details (Figure C):
Most of your data remains structured transactional data.
That same survey shows, of course, that enterprises are also embracing geospatial, social, free-form text and other NoSQL-friendly data, but clearly they're choosing to use both relational and NoSQL databases, EDWs, and Hadoop clusters.
And, not or
The problem with this truth is that it's subtle and, as Linux founder Linus Torvalds recently opined, "On the internet nobody can hear you being subtle." So, vendors from opposing camps launch broadsides against each other, even as end-customers would be better served by the different parties aligning.
Big data, after all, is a matter of "and," not "or," much of the time.
For example, companies that want to analyze customer churn will be better served by a combination of EDW and Hadoop technologies playing nicely together. Those that want to optimize retail pricing will likely benefit from marrying transactional analysis (RDBMS plus Hadoop) with location and social data (NoSQL).
Different types of data or workloads demand different big data technologies. It's common sense.
Of course, there will be times — quite often, in fact — when projects will be best served by using just one technology. But this decision should flow from the data itself and not the data management technology you already happen to own or know. In other words, just because you bought Oracle (or Hortonworks or Microsoft or DataStax or whatever) doesn't mean that is the technology you should apply.
You may have a hammer, but the data problem you're solving will not always be a nail.
Again, this is common sense, but common sense doesn't always factor into technology decisions. However, to be successful in big data, it must.
Matt is currently head of the developer ecosystem at Adobe. The views expressed are his own, not those of his employer.
Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.