It’s really hard to be a tech dinosaur in data infrastructure these days. While they still have the luxury of executive relationships with customers and a stranglehold on maintenance contracts, a new wave of data infrastructure vendors keeps making life easy for plebeian developers who no longer need to get the CIO’s permission to do, well, anything.
Now a new startup, Dremio, is democratizing data in the same way that AWS cloudified access to hardware. Let’s all shed a tear for IT, even as we celebrate this newfound freedom for BI.
Data scientists are people, too
No one really cared about the lowly developer until AWS came along. AWS grabbed a six- to seven-year headstart on its cloudy cousins by first recognizing and servicing developers’ needs for high-quality infrastructure, even as the dominant server vendors ignored them in favor of fat contracts with the CIO. In the years BC (before cloud), if a developer wanted to deploy an app she had to ask nicely of IT and wait for a few months to get the necessary infrastructure. In the years AD (AWS domination), that same developer can get immediate access to fantastic hardware infrastructure with a credit card.
Developers, it turns out, aren’t the only huddled masses yearning to breathe free.
SEE: How AI and machine learning can help solve IT’s data management problem (TechRepublic)
With data today, things are similar to the BC world, except that the underserved market includes BI analysts and data scientists who are completely dependent on IT to provision data. While we talk about the rise of the data scientist, that same data scientist has a multi-month journey to wait for IT to plow her data through ETL processes, data marts, etc. to finally give her the data she needs to get her work done.
This wouldn’t be such a problem except that no enterprise of any size has all its data sitting in one place. Ironically, however, every tool for working with data assumes that all the data is in one high-performance database. This simply isn’t the case.
Dremio’s golden ticket
What companies really need is a self-service model for business users and this is exactly what Dremio, newly out of stealth with a bevy of big data rockstars at the helm, has set out to do, as Dremio CMO (and former MongoDB executive) Kelly Stirman told me in an interview.
Because enterprise data sits in silos across the enterprise, the primary strategy for working with it is to copy data into a central place like a Hadoop cluster, then export it to something like HPE Vertica to accelerate processing of the data (because Hadoop is dog-slow). Dremio, however, removes this need to move data around, because it virtualizes access to all of an enterprise’s different data sources. In other words, Dremio makes all enterprise data look like it’s sitting in the same place, like tables in a relational database–the paradigm that every BI tool assumes.
SEE: Startup Dremio emerges from stealth, launches memory-based BI query engine (ZDNet)
Oh, and Dremio delivers interactive speeds similar to HPE Vertica without, of course, having to pay a gazillion dollars for Vertica. (Andrew Brust walks through how it all works over on ZDNet.)
Not that Dremio dreams of having enterprises dump Teradata, Informatica, HPE Vertica, and other “paid for” data sources. Not yet, anyway. Instead, as Stirman said, Dremio hopes to make those investments better by, again, virtualizing access to these disparate data sources.
Cloudera co-founder Mike Olson has said: “No dominant platform-level software infrastructure has emerged in the last ten years in closed-source, proprietary form,” and Dremio keeps to this trend. Licensed under a permissive Apache license, Dremio is built around the Apache Arrow project, a fantastic tool for doing in-memory analytics. Think of Arrow as the engine and Dremio as the rest of the car, Stirman suggested.
Sure. But I’d rather think of Dremio as yet another way that open source is taking over data infrastructure, putting IT behemoths to the sword. It’s not great for their quarterly earnings, but it’s fantastic for data scientists and BI analysts who have work to get done, and need the data now.