Hadoop is an open-source software
framework for storage and large-scale processing of datasets on clusters of
commodity hardware. As an increasingly essential tool for data scientists
looking to crack complex questions (e.g., “Will this person click on this funny
cat ad”), a slew of companies has embraced it as their own. Most of these
companies treat Hadoop as a cheap complement to their proprietary products,
rather than contribute meaningfully to its development.

Is this wrong? More importantly, is
it effective?

A new open-source holy war

After all, Hadoop is an Apache Software project, carrying a
license that essentially says, “Do whatever you want with this software, but
don’t blame me if it doesn’t work.” There is no requirement — moral or
otherwise — that developers contribute back.

Gartner analyst Merv Adrian captures this nicely:

“Having some components of your
solution stack provided by the open source community is a fact of life and a
benefit for all. So are roads, but nobody accuses Fedex or your pizza delivery
guy of being evil for using them without contributing some asphalt. Commercial
entities (including software and IT services providers) provide needed products
and services, employ people and pay taxes. We might want them to do more
charitable work or make more open source contributions, and some do, but they
are not morally obligated to do so.”

True enough. But as Red Hat has
long held, code is currency in open source. She who contributes the most code
to a given project has the most influence on that project and is best able to
steer it in a way that’s advantageous to their customers. This thought was echoed by
Hortonworks’ executive Shaun Connolly in the comment section of Adrian’s post:

“It is difficult to
drive a real enterprise-focused roadmap or fix/patch major issues if you don’t
have engineers working to make that happen within the community projects. And
if you’re doing your work off to the side of the community, then there’s no
clear path for those changes to work their way into the upstream community

With Connolly’s thought in mind,
three years ago, the Hadoop market was mostly concerned with who contributed
most to its development. Today, that’s still an issue, but more attention is
being paid to those who contribute most to making Hadoop usable by mainstream
enterprises, given its complexity.


According to a new KPMG survey, 96% of CIOs and CFOs surveyed say
that they could do a better job deriving value from data through analytics, and
56% say at least some of the resulting benefits “left on the table”
could be significant. These C-level executives perhaps should care about a
vendor’s ability to get code into the Hadoop kernel — but arguably, they don’t.
Nearly 50% of attendees to a recent Gartner webinar cited Hadoop’s lack of a clear
value proposition as its biggest barrier to adoption.

In other words, they just want
someone to make sense of Hadoop.

Cloudera, for its part, has been
pitching its “enterprise data hub” strategy as a way to make Hadoop consumable
by mainstream enterprises. While Hortonworks has stuck to its strategy of
ensuring that all innovations around Hadoop are open source, Matt Brandwein,
director of Product Marketing at Cloudera, notes that Cloudera is “building out CDH [Cloudera’s
Hadoop distribution] — the open source foundation of our enterprise data hub
platform — along with the management tools, certifications, partner
integrations, and support that our customers require to deploy Hadoop for real
production use cases.”

By some measures, Cloudera’s
strategy has been more successful. Based on general interest (measured by Google search traffic) or jobs (measured by job postings), Cloudera is in the

And yet, over the past year, I’ve
heard from many sources that Hortonworks has been on a tear, winning new
customer accounts and growing revenues at a torrid pace.

Even so, it’s very possible that
neither will win.

The return
of the incumbents

Cloudera and Hortonworks aren’t the only two Hadoop vendors in
the market. And according to some recent survey data from Gartner, they may not
be the vendors that enterprises prefer when looking to leverage
Hadoop. Instead, a majority of respondents (Figure A) want their tried-and-true BI vendors
to deliver Hadoop value.

Figure A



Results of a recent Gartner survey.


This isn’t heartening to the
pure-play Hadoop vendors, but it’s not surprising. Such customers don’t really care
about open-source bragging rights. They just want Hadoop tied into their
existing data infrastructure.

As Scott Gnau, president of
Teradata Labs, opines:

“[Hadoop is] not so
interesting that it’s open source. What’s really interesting is that it’s a way
to store data without making any change to the data, and store it in a detailed
fashion and process it in a massively parallel way.”

But even this innovation won’t be
interesting unless Hadoop vendors can close “the gap between the analysts and
the data,” as Gary Nakamura, CEO of Concurrent, expresses. Nakamura goes on to argue, “The
way to address this [gap] is to hide the complexity of Hadoop so that analysts
can get work done without having to become Hadoop experts.”

Does it matter if this is open
source? Not as much as we may think. The first priority is to get something
that works and makes life easier for mainstream analysts. Only once this core
problem is solved will anyone care about how open the software is.