Vendors like Cloudera and Hortonworks have their work cut out for them in the machine learning battle, as data gravity increasingly favors the cloud.
It's no secret that machine learning and its kissing cousin AI are all the rage. So much so, in fact, that companies are increasingly dressing up dumb apps as smart, and Cloudera is justifying a hefty IPO valuation in part on its ability to turn a Hadoop past into a machine learning future.
A more pertinent question, however, is whether the same cloud companies that are displacing enterprise data centers and taking over big data deployments will be the most likely winners in the machine learning war. Early signs suggest the answer is 'yes.'
SEE: Machine learning: The smart person's guide (TechRepublic)
Hadoop begets machine learning
For those that still think of Cloudera, Hortonworks, and MapR as Hadoop companies, that's old school thinking. As the market has evolved beyond generic "big data," so have they. The most obvious (and potentially fertile) place to evolve, as Cloudera cofounder and chief strategy officer Mike Olson wrote in the company's S-1 filing to go public, is machine learning: "[T]he same system built for managing big data in the cloud also unlocks the power of machine learning for enterprises."
SEE: Why AI and machine learning are so hard, Facebook and Google weigh in (TechRepublic)
Machine learning has been around for decades, but only recently have we had the software and systems at low enough cost to make machine learning a mass-market enterprise phenomenon. Cloudera, for its part, is all in: Its S-1 filing mentions machine learning 83 times. Hadoop? Just 14.
This shift makes sense, given that Cloudera wants to sell business value, not technology. At any rate, it's increasingly pointless to pitch Hadoop anyway, given that Hadoop is no longer Hadoop. Take a look inside Hadoop and you'll find lots of Spark, Kafka, Impala, and other new(ish) components, but no "Hadoop," as Gwen Shapira has highlighted.
Who wins the machine learning battle?
The real question for Cloudera, Hortonworks, MapR, IBM, and every other would-be machine learning aspirant isn't, as Ovum analyst Tony Baer declared, about "Spark vs. Hadoop," or some other way of asking questions of our data. Rather, he said, it's a matter of "cloud vs. Hadoop," or, in the context of machine learning, it's a question of where that data will live, and which vendors are best positioned to deliver.
Given data gravity—which is the idea that services and applications will gravitate to where the data is "born"—it's reasonable to assume that the more terrestrial vendors like Cloudera and Hortonworks will have a big part to play in the future of machine learning and AI. Why? Because most enterprise data sits inside corporate data centers, not in the cloud.
Not yet, anyway.
For data stuck in data centers, AWS offers Snowmobile, an 18-wheeler truck to move 100 petabytes of data at a time. If this seems bizarre (it sort of is), not to worry: Apps increasingly live in the cloud, and data will live there, too.
SEE: The cloud war moves to machine learning: Does Google have an edge? (TechRepublic)
That's a clear argument for the public cloud vendors to own machine learning long term. Or it would be, except companies like Cloudera argue that its products were "designed for public cloud infrastructure." In Cloudera's case, 18% of its customers run its software in the public cloud already. At Hortonworks, it's 20 to 25% that run in public or hybrid cloud environments, according to CEO Rob Bearden.
There is, however, a difference between what a Cloudera can provide vs. what AWS offers. The former delivers software that runs in the cloud, but leaves an enterprise's IT department to "actively deploy, patch, and manage the cloud instances just like they do in the data center," as Baer pointed out. For an AWS or Microsoft Azure with "home court advantage," he argued, the machine learning services are "fully managed—eliminating headaches like patching."
This means that over time, the public cloud vendors are likely to reap more from machine learning's rise than those that can't match their native, cloud-based services. In this full-cloud world, there's stiff competition. According to Algorithmia CEO Diego Oppenheimer, "Google has the most credibility based on tools they have; Microsoft is the one that will actually be able to convince the enterprises to do it; and Amazon has the advantage in that most corporate data in the cloud is in AWS. It's anybody's game."
"Anybody" also includes Cloudera and Hortonworks to be sure, but they're likely going to have to find ways to match the native cloud capabilities AWS, Microsoft Azure, and Google Cloud offer, just as MongoDB ultimately decided to offer its own "as a Service" product. This shift to the public cloud will take time—up to two decades, by AWS chief Andy Jassy's reckoning, but the time to act on it is now.
- Understanding the differences between AI, machine learning, and deep learning (TechRepublic)
- Special report: How to implement AI and machine learning (free PDF) (TechRepublic)
- How machine learning's hype is hurting its promise (TechRepublic)
- Why machine learning benefits the rich, and everyone else is toast (TechRepublic)
- Machine learning and microbes: How big data is redefining biotechnology (TechRepublic)
- Beyond the elephant in the room, Cloudera wants to talk to the business (ZDNet)
- Research: Companies lack skills to implement and support AI and machine learning (Tech Pro Research)