CXO

Splice Machine: When traditional RDBMS hits the big data performance wall

Legacy relational databases can't cope with big data demands, so Splice Machine built a Hadoop-based alternative for running operational, real-time applications.

big-data-istock000028179662small.jpg

San Francisco-based startup Splice Machine markets itself as the only Hadoop relational database management system (RDBMS). It's a scalable alternative to traditional RDBMS like Oracle or IBM for developers and architects to create a platform for building real-time enterprise-grade applications.

Splice Machine cofounder and CEO Monte Zweben in our interview last month said "what we are all about is replacing Oracle, MySQL, DB2 (IBM), or one of the old, traditional relational databases when they hit a wall from a performance perspective, or from a cost perspective. And data is so voluminous today this is happening all the time."

Zweben added that "What we have done is build a true ANSI SQL relational database system on top of the well-known, improving Hadoop stack. That's the main story about us. We bring a single message to the table, and that is affordable scale out."

splice-machine-logo.jpg
In February 2014 Splice Machine closed a $15M Series B funding round, and in August 2014 added $3M to its financing. In 2014 the firm has garnered recognition from Red Herring, AlwaysOn, and Gartner, and launched its public beta in May. Zweben told me during our interview to expect 1.0 general availability by the summer's end.

Monte Zweben's resume has some interesting entries: Splice Machine is his fourth startup venture; he has served on several boards of directors; and there was that stint in the '80s at NASA where he led the team that developed the planning system for Space Shuttle maintenance.

Early on in our call, Zweben introduced what Splice Machine does by describing one of its main customers.

Case Study: Harte Hanks

Monte Zweben: Harte Hanks is a direct marketing service that serves consumer brands and retailers in a variety of ways and is a customer of ours. The one service that we are supplying is for campaign management.

This means that on behalf of very large-scale retailers they interact with consumers across multiple channels. This is omni-channel campaign management -- sending out emails and triggering emails when consumers reach various events and states in their relationship with the retailer.

To do this Harte Hanks put together a suite of software, which consisted of IBM's Unica campaign management app as the anchor, their own Trillium data cleansing service, Cognos from IBM for reporting, and Ab Initio for ETL processes to extract information at point-of-sale systems and other e-commerce systems, and other systems that the retailers have.

All these systems were powered by Oracle RAC, and they hit a big wall. They hit significant performance problems, and they needed a solution. They looked nine months for one and they found Splice Machine.

Now Splice Machine is replacing Oracle, but is also enabling Harte Hanks to keep all of the software I just mentioned that they had on top of Oracle. So Harte Hanks does not have to retrain all of their people; they get to keep their apps, the same tools, and keep their skill sets. They just replaced the database layer and got 10 times price/performance improvements. And that's the story we are bringing to market.

Our interview with Splice Machine CEO Monte Zweben

TechRepublic: What business need did you see in the market when founding Splice Machine?

zwebenmontesplicemachine400.jpg
Splice Machine's Monte Zweben
Monte Zweben: The business need was that so many people in IT are suffering because the databases that we have used for 30 years are not able to keep up with the volume of data that is now necessary to drive applications.

For example, if you are doing digital marketing -- which any brand has to do -- the amount of clickstream, the amount of data that comes out of your customer systems, your e-commerce systems, your ERM (enterprise relationship management) systems, let alone the data that you can buy from third-party sources like Acxiom or BlueKite, it is unbelievable what your applications need to access and store in order to carry out digital marketing. And today's database technology can't handle that.

On the flip side of that, I am on the board of the company, Rocket Fuel, which just recently went public that built an architecture to service digital marketing in a very big way. They are an amazing company that optimizes media for large-scale online display advertising for brands everywhere.

What they put together was one of the most eloquent and creative software stacks that deals with this volume of software, and provides the optimal advertisement to people by performing a real-time bidding on all the different locations, and finding the best possible ad to put in front of somebody at the best possible price.

And they provably optimize media purchasing. But the software stack that they put together used the proven Hadoop stack, which they had built by themselves. They had to build it in Java in this wonderful architecture.

And what occurred to me and our team is that in order to build the next-generation application like Rocket Fuel did, the only way IT in the enterprises are going to be able to succeed is if we deploy that same power of distributed computing, but build it on something they already know. And for us that's SQL. That's the database language that they understand, have been trained in, and know.

And if we were able to deliver a true relational database management system that leveraged that distributed system capability of Hadoop, we knew that IT in the global enterprise would be able to deliver on the kind of promise that companies like Rocket Fuel deliver today.

TechRepublic: It looks like Splice Machine has applications in a number of verticals. Who or what is your target market, and what are their pain points?

Monte Zweben: If there is something that is really important for us to evangelize in the marketplace, clearly it is the fact that there are affordable scale-out solutions, but also that scale out is not just for data science and analytics anymore.

And to your point, our relational database management system supports what is called OLTP (online transaction processing) applications, not just OLAP (online analytical processing) applications. What that means is these transactional applications have concurrent, real-time reads and writes going on.

The best way to describe that type of application is just to allude to the last time you did any online shopping. You are shopping around, and you are looking at pages, you are perusing, and you are putting things into your shopping cart, looking at prices, and applying promotions. At the same time that you are doing that, thousands of other people are doing the same thing.

And keeping a database consistent when all of that is going on at the same time requires a special kind of database that adheres to what the techies call ACID properties (atomicity, consistency, isolation, and durability). ACID itself is what makes it so that you can have real-time applications. And we believe that we are the only SQL on Hadoop player that provides ACID properties to handle both OLAP and OLTP workloads.

What does that mean in the business world? That means we can do the digital marketing applications that I just talked about. You are not only doing long-running queries and analytics on market segments that might be useful for you to target, but you are also monitoring real-time interactions, personalizing your website or the emails that are going out, and registering in real time what the consumer is doing so that you can trigger the right behavior in the next interaction.

So digital marketing is a really important use case for us. It spans many verticals including consumer brands, retail, financial services, automotive, telecommunications, entertainment -- everyone who deals with the consumer has to do that.

TechRepublic: What differentiates Splice Machine?

Monte Zweben: The key thing that differentiates us is that real-time, transactional capability. Everybody that is doing big data, and SQL with big data -- they are all focused on data science, where a batch of data comes in from other sources into the Hadoop environment, data scientists crunch on that data by applying analytics and creating reports and dashboards, and then those reports and dashboards are distributed to decision makers and business folks to make decisions and take action.

That is sort of a batch slow cycle, often what we call T minus one day or week -- the action or decision is being made on the basis of data from yesterday or earlier, where the process of ETL is taking some time.

What we are all about is powering real-time applications that may have a web browser front end, or a mobile front end, or a social front end, or just be consuming data in real-time, rather than a team of data scientists crunching on data.

That distinction between running an app -- powering an application and performing real-time operations -- and doing data science is the distinction between having a general-purpose relational database management system vs. a SQL layer on a Hadoop analytic system.

Also read

About

Brian Taylor is a contributing writer for TechRepublic. He covers the tech trends, solutions, risks, and research that IT leaders need to know about, from startups to the enterprise. Technology is creating a new world, and he loves to report on it.

Editor's Picks