Cloudera co-founder identifies the biggest opportunities for big data

Want to see the future of big data? Just ask Cloudera co-founder Mike Olson.

Big data
Image: Jimmy Anderson

Cloudera, reportedly on track to generate nearly $200 million in revenue this year, could be forgiven for bragging.

And yet when I sat down with Mike Olson, Cloudera's co-founder and chief strategy officer, he was anything but arrogant. As he puts it, "Data matters across literally every industry, so nobody gets to be king." This echoes something Olson wrote back in 2012: "What's exciting about [Hadoop] isn't the opportunity it's giving to vendors like us. It's the value that Hadoop is unlocking in big data for the industry and for society at large."

To gauge just how far the industry has come in unlocking that value, and where it's likely to go next, I caught up with Olson for a mid-year reset, and learned that Cloudera's go-to-market strategy now embraces "verticalizing" big data, among other things.

Here's how he explained this strategic move for the big data pioneer.

TechRepublic: What are the hottest verticals for big data in 2015?

Olson: Data matters across literally every industry, so nobody gets to be king. That said, though, there are sectors that are moving fast and doing really interesting things.

Telco and the "customer churn" problem is one profound problem where we're seeing really exciting work in 2015. Telcos are spending north of $300 (USD) per each new customer acquisition, and increasing retention even fractionally really affects their revenues. We recently demonstrated with partners Trifacta and Zoomdata how an enriched view of better data from more sources can help telcos reduce churn.

TechRepublic: What's so tough about the customer churn problem, from a data perspective?

Olson: Telco is really challenging in the complexity of their data mapping and consolidation use cases. We call these the "tip of the iceberg" for big data in telco--and here we're talking about all of the domains that cross user profile and usage data (account info, transactions, things like that), mobile and devices (GPS / location, set-top box logs), network logs, marketing, and CRM systems, and then data in the public domain.

Historically, all that data was carefully siloed. Telcos had systems for running their networks, decision support for customer engagement and marketing, their back-office billing systems, and so on. Those systems were spread out and walled off from one another.

If you think about it, customer churn is really a lagging indicator of customer satisfaction--by the time you find out a customer is unhappy, it's too late.

What telcos want is to understand data across silos and to enrich it with external data, like social media. The objective is to understand customers' behavior and figure out what their experience is, without even talking to them. Companies want to build predictive models of what their users are likely to do, based on that previously-invisible experience. We see this creating leading indicators of customer churn, which is really much more actionable data.

TechRepublic: How exactly is big data enabling this move towards more "leading indicators" of churn?

Olson: Well, the so-called "unified customer view" that has been promised by the storage and BI vendors for decades has really been challenged by today's demands for "real-time."

You have a lot of telcos with this huge mix of traditional SQL datastores and other legacy systems that are sitting alongside more real-time data stores. Getting these systems to talk to each other at all is extremely hard. Getting them to do it in milliseconds is pretty well impossible.

Solve that, and you still have a very tough processing and time-to-visualization problem that you need for telco analysts to actually make the data useful. If you can't deliver real results to real users in real time, you're in real trouble. We see the convergence of all of these factors giving the rise to what we call "full stack analytics."

We see a few particular technologies really driving innovation in the aggregation and speed area. Cloudera's Enterprise Data Hub is providing a foundation for aggregated data storage, transformation and processing, self-service exploratory BI, and analytics on top of Hadoop, and it's attracting ISVs who are building advanced capabilities on top of it.

We knock down the silos and make it easy to integrate the data.

On that foundation, Zoomdata brings the critical performance and ease-of-use to big data discovery for actual humans. Traditional BI tools are terrible at performance when you get past the terabyte range. Zoomdata makes it possible for telcos to query all of their data, see results "sharpening" as they come in, pause, and start new queries.

They are at the cutting edge of making big data discovery actually map back to human operators. We're really excited about them. When you hook Zoomdata up to state-of-the-art analytics databases optimized for big data--like Impala, part of that Enterprise Data Hub--the performance is really astounding.

To easily transform raw, complex data into clean and structured formats for analysis so you can get more value from your data faster, Trifacta is a great tool. With the new Trifacta deployment on Cloudera Live, you get the full functionality of Cloudera's platform, along with an integrated trial of the Trifacta Data Transformation Platform to help you wrangle a variety of complex data.

TechRepublic: Where do you see the biggest disruptions in big data applications coming over the next five years as Hadoop continues to penetrate the enterprise?

Olson: For the next five years, I think the action is going to be in solutions for real business problems. Everyone's heard about big data and is interested. We've seen great software tools--Zoomdata and Trifacta, for example--emerge to take native advantage of the new platform.

But business users today need to attack important problems without hiring armies of data scientists or radically retooling their IT and analyst teams.

There's tremendous demand in our customer base for apps that measure risk in equity portfolios, monitor networks for intrusions and report on them in real time, and improve satisfaction to keep customers on board and engaged. I think that the telco example I cited is a great template of the kind of integrated solutions that large enterprises need.

There are lots of problems that these new tools can be combined to solve.

Also see