Start-Ups

How Cloudera defined big data, and was defined by it

Cloudera quickly became a leader in the big data market after it launched in 2008. Here's how it turned Hadoop into an enterprise data hub.

clouderahero.png
Image: Cloudera

In the 2000s, big data quickly rose to prominence as the currency of Silicon Valley tech companies. Companies were producing data in new ways and Moore's law made storage cheap enough to hold onto that data. Still, there was a problem. Collecting data for data's sake, with no way to process or analyze it, made that data essentially useless.

While the open source project Hadoop was beginning to gain traction, Cloudera launched in 2008 and began to target enterprise customers for specialized deployments of the software. Cloudera quickly rose to prominence as one of the leading Hadoop vendors and continues to grow.

Mike Olson, the chief strategy officer and a co-founder of Cloudera, said that the value wasn't always clear in the early days and the team often had to evangelize Hadoop and big data.

"When we started we were a voice in the wilderness," Olson said. "Nobody had ever heard of the platform. Nobody understood why big data mattered."

Now the Cloudera ecosystem of partners is huge. Olson said the company has more than 1,000 partners that provide tools and services. According to Olson, much of what we are seeing now is a "generational shift" in how companies are thinking about data, and Cloudera is one of the earthquakes that is driving that shift.

The company

Olson spent 25 years working in the database industry, working for companies such as Informix and Oracle. His first encounter with Hadoop was not the ah-ha moment one would expect for a founder.

"In the early 2000s Google invented this technology that became open source Hadoop, for processing basically all the information on the internet," Olson said. "It was an audacious ambition for Google to have at the time. All us database guys knew that it was physically impossible to build that kind of thing. Google didn't realize that and went ahead and did it."

What Olson is referring to are two seminal technologies that Google was working on at the time. In 2003, Google release a paper on their Google File System (GFS) and, in 2004, Google released a paper on MapReduce. GFS is a distributed file system and the paper described MapReduce as: "a programming model and associated implementation for processing and generating large data sets."

Then, Doug Cutting and Mike Cafarella created Hadoop, named after Cutting's son's toy elephant, and based on the research released by Google. Hadoop was quickly adopted by companies such as Facebook, Yahoo, and Amazon for data analysis on a large scale.

This peaked Olson's interest in big data, so he began working on a company that would make use of the Hadoop platform. Christophe Bisciglia of Google, Amr Awadallah of Yahoo, and Jeff Hammerbacher of Facebook were individually working on similar projects until all four of the founders met and decided to join forces.

"We banded together over the course of the summer, convinced ourselves that we could start one company together, rather than four companies separately, wrote a strategy document and a fundraising deck in the early fall and closed a Series A round with venture capitalists in the Valley, Accel Partners, in October 2008," Olson said.

clouderateam.jpg
The Cloudera founders and founding team.
Image: Cloudera

Shortly after they raised $5 million from Accel, the economic crisis hit with full power in Silicon Valley and funding for new companies slowed to a crawl. This gave Cloudera a strategic advantage in the big data space.

"Basically, we had money and no competition to go figure out how to go to market with this product. That's really where the business came from," Olson said.

Olson maintains that the goal of Cloudera remains the same — take the data processing magic of Hadoop and make it available to more traditional enterprises. Everybody is able to get their hands on huge amounts of data and now processing and analyzing that data is a real problem for most organizations.

Ping Li, a partner at Accel, said that the key to Cloudera's success was in realizing that big data processing wasn't just a problem for internet companies, but was something that would eventually be faced by all enterprises. Early on Hadoop was a scalable, cost-efficient way to store and process large quantities of data that don't fit nearly in columns and rows, and Cloudera was able to make that work for business customers.

"What Cloudera has done is taken Hadoop and evolved that into the enterprise data hub (EDH)," Li said. He noted that Hadoop is a major component of it, but enterprises need more services, such as security, provisioning, management. Cloudera basically took Hadoop as a tool and made it enterprise-ready.

Big data success

One of the most blatant reasons for Cloudera's success is the founding team behind the product. The pedigree of talent among the four founders is high and Olson said that their individual skills complement one another very well. This skill level also extends to the rest of the team. Olson said that, if he had to attribute the success of Cloudera to one thing, it would be to the talent of the team as a whole.

"We've hired very well," he said. "We've managed to bring on great people who have done excellent work, and also served as a gravitational force to pull other great people into our orbit."

Li echoed this sentiment and added that a big part of Cloudera's recruiting strategy is finding people who are committed to the open source community.

"Early on in the company's history the founders recruited Doug Cutting, the creator of Hadoop, to the company," Li said. "And then they brought in lots of open source committers and, to this day, that's still a big part of the R&D team because so much of the technology is open source."

In addition to the team, much of Cloudera's success can be attributed to timing. Not only did they have little to no competition in 2008, they were dealing with a potentially explosive market opportunity. Hadoop came along at just the right time. Many of the big tech companies were using Hadoop at the time to go through the data they had, so it became clear that, what Li called "the data deluge," was a problem that needed to be solved in the enterprise.

"The availability of this data at scale, that's kind of a new thing," Olson said. "In the 1990s, getting your hands on a terabyte was hard. These days, not getting your hands on a terabyte is hard. So much data is available that people are overwhelmed by it."

Of course, the way companies see data has changed dramatically since Cloudera began its journey. Cloudera's offerings changed based on the market and there are two key areas where Olson said that the company both leads the market and is subsequently led by the market:

  1. Enterprise manageability. Being that Hadoop was born on the consumer internet, demand for security features such as PCI compliance and key management became critical. Knowing where the data came from and who had been able to touch it also became an urgent requirements which led Cloudera to fold those details into its enterprise data hub.
  2. Data process and analysis. Making data available in new ways and opening up more analytic and processing workloads has made the data more valuable.

According to Olson, data is going to matter to society in ways that were never possible before. We have the ability to instrument things that we couldn't before by using sensors and measurements tools, and that will affect how the world sees data in the future.

"If you think big data has happened now, wait till all of the building sensors in the city of San Francisco wake up one day and they're on a network, and we can actually get our hands on that data," Olson said. "That data explosion hasn't even begun yet."

Also see

About

Conner Forrest is News Editor for TechRepublic. He covers startups and enterprise technology and is passionate about the convergence of tech and culture.

Editor's Picks

Free Newsletters, In your Inbox