Big Data

The secret ingredients for making Hadoop an enterprise tool

A Gartner survey on Hadoop is encouraging, says Andrew Brust, given that it's not fully in the hands of enterprise users. Here's what he thinks can change that.

Image: iStock/weerapatkiatdumrong

The enterprise adoption rate for Hadoop is 26%, according to a Gartner survey released in May 2015. But is that high or low? Is it good or bad?

It's a "big adoption number" according to ZDNet big data columnist and Datameer evangelist Andrew Brust. He wrote on ZDNet that he would have guessed lower, since "Hadoop's legacy is that of a specialist's tool, not an enterprise tool." (ZDNet and TechRepublic are CBS Interactive properties.)

When I asked Brust in an email Q&A how firms can change that, he said "what companies can do to make Hadoop into an enterprise tool is embed it as an engine — which is what it's designed for — rather than feature it as a component that users touch directly. Then add things like security, governance and cost-based optimization, and things get much more enterprise-appropriate."

Big data analytics firm Datameer has been focused on governance. When asked how it relates to big data adoption, he explained that governance "is a prerequisite in the enterprise, period; not just for big data. Most enterprise customers have regulatory regimes to contend with and tools that can't help them comply are next to useless."

Also in this Q&A, Brust discussed Datameer's recently launched big data governance capabilities, his passionate issues as a tech evangelist, and his take on data democratization.

TechRepublic: Last May Gartner released a survey of its Research Circle members, showing a "low" 26% big data adoption rate. You wrote on our sister publication ZDNet that this figure is good, given that Hadoop has been "a specialist's tool." What can companies do to make big data into what you refer to as an enterprise tool?

andrew-brustdatameer.jpg
Andrew Brust
Image: Datameer
Andrew Brust: Yes, I think 26 is a very big adoption number for something like Hadoop that is still relatively early in its adoption journey. Considering the complexity around it that I mentioned, Gartner's finding is positive, not negative. I think Gartner knows that — essentially they've said that hype outstrips the reality, but they've also said that Hadoop has moved past the hype cycle, so that leaves only good things to come. What companies can do to make Hadoop into an enterprise tool is embed it as an engine — which is what it's designed for — rather than feature it as a component that users touch directly. Then add things like security, governance, and cost-based optimization, and things get much more enterprise-appropriate.

TechRepublic: Another question about big data use in the enterprise: Why is governance an issue in order for big data adoption to grow?

Andrew Brust: Governance is a prerequisite in the enterprise, period; not just for big data. The BI world has been accommodating it for years, with respect to role-based security, lineage, audit, master data management, and more. Most enterprise customers have regulatory regimes to contend with, and tools that can't help them comply are next to useless. That the big data world disregarded these requirements for so long was, frankly, shocking. The gap is finally being closed. The question isn't whether that's necessary; it's why it took so long.

TechRepublic: Regarding that topic, what benefits do the data governance capabilities announced this summer have for Datameer customers?

Andrew Brust: The benefits are pretty big. In fact, some of what we announced, including role-based security around data and access, has been in the product for some time. But our lineage tools and our listener-based API for audit and for integration with other governance systems is huge. The lineage graph produces a very useful, visual tool showing how data has flowed through Datameer. And virtually every operation that takes place within Datameer is pushed out over the API, so external systems can be made aware and Datameer becomes a fully participatory citizen in the emerging Hadoop governance ecosystem. Better yet, our architecture ensures this participation regardless of which external technologies come to dominate.

TechRepublic: As a tech evangelist, what issues do you feel most strongly and passionately about in the big data space?

Andrew Brust: My biggest issue is around current complexity in the ecosystem and my greatest passion is around anything that tames that complexity. I think Hadoop, on its own, is complex and the emerging technologies, like Spark, make it more so. Developers may love that, but business users and even enterprise developers need an environment that's more productive. Datameer is rather dedicated to that simplification, so are a few other players. We need more of it.

TechRepublic: Looking at big data innovation over the next two to three years, what potential changes do enterprises need to be most aware of?

Andrew Brust: Well, we're seeing a lot of innovation on the execution engine side of things. Spark has gathered a lot of attention. Flink is waiting in the wings. Streaming systems are maturing, which brings about new Apache projects like NiFi. So things are volatile, and enterprises need products that can work "close to the metal" of these new engines, but also abstract them, and the overall volatility, away, so the focus can be on the analysis task at hand, rather than on cobbling together the right combination of technologies to get it done.

TechRepublic: In its recent press release regarding Datameer's $40 million funding round, the company referred to taking "data democratization" global. How would you define that expression, both in terms of Datameer's mission and its meaning to analytics users?

Andrew Brust: To me, democratization is about making a technology more accessible and —significantly — more fun to a broader group of users, including users in business roles, rather than technical ones. That's what the self-service BI movement was all about, for example. But, ironically, Hadoop on its own lacks not only true self-service capabilities, but even the usability of pre-self-service, enterprise BI. Fixing that has always been Datameer's goal. Command line tools and shell scripts may be a way technologists like to work with big data, and they may also enjoy figuring out which execution engine to use for a given task. But enterprise customers need something far more evolved, automated, and aligned with their workflow that automates engine choice and provides a business-oriented user interface. Democratization of big data means bringing that about.

TechRepublic: On a personal note, how do you define your own role and mission as an evangelist and technical product marketer at Datameer?

Andrew Brust: Big data technology is still new. Understanding what it can do, and appreciating what capabilities make it harder or easier to use, is not simple. My job is to help illuminate those things, and do so free of practitioner jargon. That way, customers know the potential of the underlying technologies and know how to work around the obstacles that can thwart that potential.

Also see

Note: TechRepublic, ZDNet, and Tech Pro Research are CBS Interactive properties.

About Brian Taylor

Brian Taylor is a contributing writer for TechRepublic. He covers the tech trends, solutions, risks, and research that IT leaders need to know about, from startups to the enterprise. Technology is creating a new world, and he loves to report on it.

Editor's Picks

Free Newsletters, In your Inbox