NoSQL databases promise to upend a decades old relational approach to data. But with over 100 different NoSQL options to choose from, separated into unfamiliar categories like "document" and "wide column" and "key-value" and "graph," NoSQL's diversity may be its own worst enemy.
Today, however, DataStax, the company behind Cassandra, a wide column database, announced its first-ever acquisition, bringing together the best of Cassandra and TitanDB (a graph database). The move promises to make NoSQL an easier landscape to navigate, even as it helps DataStax claim increasingly sophisticated workloads.
Graphs, columns, and more
Open source powers big data, with most of the underlying infrastructure available at no charge to those savvy enough to download and develop it. Nowhere is this more true than NoSQL databases, which number well over 100 and deliver surprisingly different capabilities.
Despite the common categorization, there are more differences than similarities between a document database and a key-value store.
Unlike a key-value store, Cassandra, one of the most popular NoSQL databases (currently ranked #8 on DB-Engines' list of popular database products), is a general purpose database that fills a range of needs within an enterprise.
In talking with DataStax co-founder Matt Pfeil last week, however, he pointed to five generic use cases that comprise 80% of Cassandra workloads:
- Recommendation engines
- Fraud detection
- Internet of Things
- Product catalogues
Interestingly, two of those categories — fraud detection and recommendation engines — are more suited to a graph database than a wide column database. According to Pfeil, DataStax felt it "had a duty to provide higher level value to our customers."
It probably didn't hurt that, according to Gartner's report, "IT Market Clock for Database Management Systems," "Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture."
In other words, there's money in them thar graphs.
Rather than leave Cassandra customers to essentially build their own graph database on top of Cassandra, DataStax decided to bring that functionality in house by acquiring Aurelius, the company behind TitanDB.
Building an enterprise-class data platform
The two companies have loosely collaborated for some time, with Aurelius' Dr. Matthias Broecheler speaking at Cassandra Summit 2014. The goal, however, isn't simply to cobble together Cassandra and TitanDB in some loose data affiliation or to have them power small-scale applications. Many customers were already doing this on their own, stitching together the systems.
Other graph database options like Neo4j tend not to be used, according to DataStax, because Neo4j is a scale-up architecture that can scale reads but has a single choke-point on writes, making it a poor choice to achieve horizontal scale.
And this scale is really what Cassandra users needed from a graph database partner. Enter Aurelius.
As Broecheler told me, "Trying to run a graph database across a distributed system is a very hard engineering problem, one that Titan does well. The TitanDB team has figured out how to map relationships across distributed systems in a highly efficient manner."
By combining the two, DataStax can credibly claim to offer "the only unified distributed database management platform to include graph, analytics, search, and in-memory capabilities" — all running at significant scale.
It's an ambitious goal: building a platform to handle the industry's increasingly diverse data sets.
And, in my experience, it's critical for the industry to move forward.
One of the things I learned in my years with NoSQL databases is that developers don't want to have to learn 20 different databases. They want just a few that can be used for a broad array of applications.
By blending the best of wide column and graph database capabilities, DataStax just advanced Cassandra as even more general purpose, while simultaneously making it an easy choice for applications that need to incorporate graph data relationships at significant scale. In other words, it just became an even better option for multi-model use case environments.
Some have argued that we're entering the era of "polyglot persistence," a time when the developers running the enterprise asylum will pick the best database for a particular workload, even if that means "standardizing" on an unwieldy assembly of different database options.
What the market wants is NoSQL databases that can handle the scale of distributed, diverse workloads. They don't want 100 niche options — but rather, just a handful of powerful options. With this acquisition, DataStax just cemented its place on the short-list of NoSQL database vendors and, indeed, positioned itself as a unified data management platform.
Matt is currently head of the developer ecosystem at Adobe. The views expressed are his own, not those of his employer.
Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.