Red Hat pledges allegiance to Hortonworks in the battle of Hadoop ecosystems

Matt Asay thinks it's much more important that Red Hat work with the Hadoop vendor that makes it easiest to derive value from Hadoop rather than the most open vendor. Do you agree?


Red Hat and Hadoop

Ultimately, Hadoop is a battle of ecosystems, just as Linux was before it. Red Hat dominated the Linux market by becoming the first choice Linux against which ISVs and IHVs certified. Hadoop, still in flux, may well depend on similar dynamics.

Among the open-source Hadoop vendors, Cloudera and Hortonworks have spent the last few years piling up partnerships. Cloudera nabbed Oracle; Hortonworks got Microsoft. But this partner ecosystem tit-for-tat took an interesting twist this past week when Red Hat threw its weight behind Hortonworks.

The Linux wars all over again

Red Hat’s dominance in Linux isn’t a matter of 1s and 0s. It’s a function of partnerships and always has been.

Early on, Red Hat secured Oracle as a partner. Oracle just wanted to lower the perceived price of its database by running on commodity hardware and an open-source operating system. Red Hat, for its part, just needed a heavyweight enterprise technology vendor to bless Red Hat Enterprise Linux (then “Advanced Server”).

From that moment, SUSE -- then and now the number two enterprise Linux distribution -- played catch-up. Red Hat started racking up certifications for RHEL, and SUSE lagged behind. As someone who worked for Novell (SUSE) and then Canonical, I experienced the effects of this firsthand. Risk-averse CIOs came to know that their existing IT investments in Business Objects, Oracle, or name-your-technology-of-choice would run first on RHEL.

Today, Cloudera and Hortonworks have also been pursuing a land grab for marquee partners, trying to shore up their ecosystem credentials and make Hadoop safe and easy for enterprise IT. While Red Hat and Hortonworks have been working together for nearly a year, their deepening alliance sends interesting signals.

Like attracts like in Hadoop land

On one hand, the two open-source purists getting together is unexceptional. After all, as 451 Research’s Matt Aslett reminds us in a research note, “Hortonworks and Red Hat are natural partners given the strong commitment to open source on both sides.” All else being equal, Red Hat will always choose to side with an open-source player. It’s in the company’s DNA.

This similar DNA between Red Hat and Hortonworks makes it easier to engage in the three primary elements of the partnership as they mutually try to sate enterprise demand for big data solutions:

  • A commitment to jointly engineered solutions to enable a seamless customer experience
  • Execution of joint go-to-market of activities
  • Collaborative customer support

The question, however, is whether “open” matters most in big data and, in particular, Hadoop.

Right market, wrong priorities?

As Red Hat CEO Jim Whitehurst told me in a recent phone interview, much of the essential infrastructure behind Big Data, cloud, and other industry trends is open source, setting the stage for a new wave of Red Hat growth. But in the Hadoop market today, more than anything else, enterprises want vendors to remove Hadoop’s complexity.

There is some evidence that enterprises are removing their Hadoop training wheels, moving from basic ETL workloads to advanced analytics workloads. But for most enterprises, big data is still a big mystery, and Hadoop’s complexity contributes to this.

As such, the winning Hadoop vendor is likely going to be the one that makes Hadoop easy, rather than the one that makes it open.

Cloudera gets this and has therefore been doubling down on its “Enterprise Data Hub” messaging, simplifying its product offerings, and innovating heavily around core Hadoop to make it easier for enterprises to digest. For example, Cloudera built Apache Sentry, which delivers fine-grained authorization to data stored in Apache Hadoop. Without it, every user sees every other user's data. Cloudera also built Hue, a standard way for Hadoop distributions to provide programmatic UI help to application developers, which is now bundled by all Cloudera’s competitors in their own distributions.

The list goes on, all of which paints a more complex picture than “Hortonworks is 100% open and Cloudera is not.” Mike Olson, chief strategy officer for Cloudera, confirms this in an email to me:

If you look back over the last eighteen months or so, Cloudera has been driving new capabilities -- security, governance and compliance support -- into the open-source projects. We've been adding new real-time and analytics support -- Impala, Search, machine learning via Oryx, Spark.

Our conviction is that big data needs a central shared repository, with consistent security and management, and a variety of engines for processing and analysis. All those engines must work on the data in place.

The problem for Red Hat, however, is that some of Cloudera’s innovations, particularly around management/tooling for Hadoop, includes technology that it offers solely to paying customers and hasn’t released under an open-source license. Hortonworks also creates a lot of supporting code for Hadoop but releases all of it as open source.

Should this matter? Maybe. My hunch is that Red Hat wants an open-source safety valve on its Hadoop partnership. Given the frothy hype around Hadoop, it’s likely that big, proprietary incumbents like Oracle or HP could try to buy Cloudera or Hortonworks. In such a case, essential functionality would be owned by a potential Red Hat competitor rather than by an open source community.

The real risk is non-adoption

Important as this consideration is, it strikes me as much more important that Red Hat work with the Hadoop vendor that makes it easiest to derive value from Hadoop -- not necessarily the most open vendor.

This is not to suggest that Hortonworks doesn’t offer value beyond its open-source bonafides. It does.

I simply believe that open source shouldn’t be the primary driver of Red Hat’s big data strategy, any more than its early RHEL strategy was driven by open source. RHEL won first and foremost because it delivered the enterprise database of choice: Oracle. Next, it won by certifying thousands of other proprietary technologies.

In conclusion, the first consideration for every big data vendor today should be to hide the underlying technology as much as possible so as to surface real, easy-to-capture business value. Openness is nice but not sufficient.