Ultimately, Hadoop is a battle of
ecosystems, just as Linux was before it. Red Hat dominated the Linux market by
becoming the first choice Linux against which ISVs and IHVs certified. Hadoop,
still in flux, may well depend on similar dynamics.

Among the open-source Hadoop
vendors, Cloudera and Hortonworks have spent the last few years piling up
partnerships. Cloudera nabbed Oracle; Hortonworks got Microsoft. But this
partner ecosystem tit-for-tat took an interesting twist this past week when Red Hat threw its weight behind Hortonworks.

The Linux wars all over again

Red Hat’s dominance in Linux isn’t a matter of 1s and 0s. It’s
a function of partnerships and always has been.

Early on, Red Hat secured Oracle as a partner. Oracle just
wanted to lower the perceived price of its database by running on commodity
hardware and an open-source operating system. Red Hat, for its part, just
needed a heavyweight enterprise technology vendor to bless Red Hat Enterprise
Linux (then “Advanced Server”).

From that moment, SUSE — then and now the number two enterprise
Linux distribution — played catch-up. Red Hat started racking up certifications
for RHEL, and SUSE lagged behind. As someone who worked for Novell (SUSE) and
then Canonical, I experienced the effects of this firsthand. Risk-averse CIOs
came to know that their existing IT investments in Business Objects, Oracle, or
name-your-technology-of-choice would run first on RHEL.

Today, Cloudera and Hortonworks have also been pursuing a
land grab for marquee partners, trying to shore up their ecosystem credentials
and make Hadoop safe and easy for enterprise IT. While Red Hat and Hortonworks
have been working together for nearly a year, their
deepening alliance sends interesting signals.

Like attracts like in Hadoop land

On one hand, the two open-source purists getting together is
unexceptional. After all, as 451 Research’s Matt Aslett reminds us in a
research note, “Hortonworks and Red Hat are natural partners given the strong
commitment to open source on both sides.” All else being equal, Red Hat will
always choose to side with an open-source player. It’s in the company’s DNA.

This similar DNA between Red Hat and Hortonworks makes it
easier to engage in the three primary elements of the partnership as they mutually
try to sate enterprise demand for big data solutions:

  • A commitment to jointly engineered solutions to enable
    a seamless customer experience
  • Execution of joint go-to-market of activities
  • Collaborative customer support

The question, however, is whether “open” matters most in big data and, in
particular, Hadoop.

Right market, wrong priorities?

As Red Hat CEO Jim Whitehurst told me in a recent phone
interview, much of the essential infrastructure behind Big Data, cloud, and
other industry trends is open source, setting the stage for a new wave of Red
Hat growth. But in the Hadoop market today, more than anything else, enterprises
want vendors to remove Hadoop’s complexity.

There is some evidence that enterprises are removing
their Hadoop training wheels, moving from basic ETL workloads to advanced
analytics workloads. But for most enterprises, big data is still a big mystery,
and Hadoop’s complexity contributes to this.

As such, the winning Hadoop vendor is likely going to be the
one that makes Hadoop easy, rather than the one that makes it open.

Cloudera gets this and has therefore been doubling down on its
“Enterprise Data Hub” messaging, simplifying its product offerings, and
innovating heavily around core Hadoop to make it easier for enterprises to digest.
For example, Cloudera built Apache Sentry, which delivers fine-grained
authorization to data stored in Apache Hadoop. Without it, every user sees
every other user’s data. Cloudera also built Hue, a standard way for Hadoop
distributions to provide programmatic UI help to application developers, which
is now bundled by all Cloudera’s competitors in their own distributions.

The list goes on, all of which paints a more complex picture than “Hortonworks is
100% open and Cloudera is not.” Mike Olson, chief strategy officer for
Cloudera, confirms this in an email to me:

If you look back over the last
eighteen months or so, Cloudera has been driving new capabilities — security,
governance and compliance support — into the open-source projects. We’ve been adding
new real-time and analytics support — Impala, Search, machine learning via
Oryx, Spark.

Our conviction is that big data needs
a central shared repository, with consistent security and management, and a
variety of engines for processing and analysis. All those engines must work on
the data in place.

The problem for Red Hat, however, is that some of Cloudera’s
innovations, particularly around management/tooling for Hadoop, includes
technology that it offers solely to paying customers and hasn’t released under
an open-source license. Hortonworks also creates a lot of supporting code for
Hadoop but releases all of it as open source.

Should this matter? Maybe. My hunch is that Red Hat wants an open-source safety
valve on its Hadoop partnership. Given the frothy hype around Hadoop, it’s
likely that big, proprietary incumbents like Oracle or HP could try to buy
Cloudera or Hortonworks. In such a case, essential functionality would be owned
by a potential Red Hat competitor rather than by an open source community.

The real risk is non-adoption

Important as this consideration is, it strikes me as much more
important that Red Hat work with the Hadoop vendor that makes it easiest to
derive value from Hadoop — not necessarily the most open vendor.

This is not to suggest that Hortonworks doesn’t offer value
beyond its open-source bonafides. It does.

I simply believe that open source shouldn’t be the primary
driver of Red Hat’s big data strategy, any more than its early RHEL strategy
was driven by open source. RHEL won first and foremost because it delivered the
enterprise database of choice: Oracle. Next, it won by certifying thousands of
other proprietary technologies.

In conclusion, the first consideration for every big data vendor today should be to hide the underlying
technology as much as possible so as to surface real, easy-to-capture business
value. Openness is nice but not sufficient.