NoSQL enables big data applications and has been on a tear as a result. But how do you decide if they're right for you?
Postgres has been on a roll lately. Despite a reputation for being the older, wiser fuddy-duddy of open-source relational databases, Postgres has been running circles around MySQL in terms of growth. In fact, Postgres even managed to inch back into fourth place on the DB-Engines database popularity rankings.
But as hot as Postgres has been, NoSQL databases are even hotter.
While there are a number of ways to measure NoSQL's rise, there is one clear reason: today's data is messy, variable, and growing extraordinarily fast. For such data, a new breed of database is required.
The hot get hotter
This may explain why it has been such a dismal year for Microsoft SQL Server. And Oracle. And MySQL. (IBM's DB2 has been on a slide for some time.)
While the big three relational databases (RDBMS) still command a massive lead over the next three database contenders (Postgres, MongoDB, and IBM's DB2), most of the growth in adoption goes to NoSQL databases.
Here's how the top 10 databases have fared over the the past year:
While DB-Engines aggregates disparate data to score database popularity (including everything from Google searches to LinkedIn profile mentions), employer demand for particular technology skills is perhaps the best indication of adoption.
In that area, NoSQL databases, like MongoDB and Apache Cassandra, exhibit stratospheric growth relative to their more staid RDBMS counterparts:
While there are many reasons to turn to a NoSQL database, and even old-school workloads lend themselves well to NoSQL, the biggest reason for NoSQL's rise is big data.
The big get bigger
We no longer live in the world of neat-and-tidy, predictable data. Oh, sure, there are plenty of applications that still fit this model, but increasingly the world's data is messy, i.e., semi-structured or unstructured.
Relational databases were a revelation in their time, decoupling query design from schema design, and they enabled developers to focus solely on schema design, knowing they could later query their data as they wanted. But this same revelation--fixed schema design--has become RDBMS' Achilles Heel.
It's not that NoSQL has no schema. It does. (Or should.)
Rather, NoSQL allows schema to be flexible and to apply schema on read (rather than on write). The effect, as DataStax's Scott Hirleman argues, is impressive:
"NoSQL databases thrive in today's high-volume, high-variety online applications. They enable companies to be more agile, especially while deploying new features; more flexible (store varied and/or complex data types); and able to support scale without high costs and complexity."
These attributes--agility, flexibility, etc.--are not constrained to web giants like Google or Facebook. Mainstream enterprises need them, too.
So, how do you decide if you can use a NoSQL database for your application?
Is NoSQL a yesSQL for you?
You can, of course, turn to all the various inane benchmarks out there. They are all "independent" and each purports to show that one database or another is the greatest thing To. Ever. Be. Invented.
This one, for example, shows MongoDB to deliver 13x more predictable scaling than other NoSQL databases. Oh, but this one shows Cassandra to deliver orders of magnitude better performance across a variety of workloads.
Benchmarks, however, aren't a very good source of factual information.
The real measure of database maturity ("demonstrated capability and quality with a lot of thought given to all the little things"), as Baron Schwartz writes, and the best way to ascertain whether it's right for you, is to delve into that database's success stories.
It may seem obvious why success stories are important, but in the proprietary database world, the system is gamed to bias positive stories. Not so in open source, as Schwartz describes:
"Success stories and a community of users go together. If I can choose from a magical database that claims to solve all kinds of problems perfectly, versus one that has broad adoption and lots of discussions I can Google, I'm not going to take a hard look at the former. I want to read online about use cases, scaling challenges met and solved, sharp edges, scripts, tweaks, tips and tricks. I want a lot of Stack Exchange discussions and blog posts. I want to see people using the database for workloads that look similar to mine, as well as different workloads, and I want to hear what's good and bad about it."
The easiest way to gauge the utility of NoSQL for you, then, is simply to download one that looks promising and try it out. Hbase is a natural pairing for Hadoop; Cassandra has always been good at simple but significant scale; MongoDB is perhaps the most approachable and allows for a broad array of workloads.
Each of these databases has thousands to tens of thousands of companies happily using them, usually for new workloads that help them tame big data of one kind or another. So, don't trust me that you need a NoSQL database. Download one today, and then read up on the success stories (and failures) to determine if they're right for you.
- Postgres pushes past MySQL in developer hearts
- Hadoop numbers suggest the best is yet to come
- NoSQL Databases: A Survey on Schema Less Databases
- How to scale online services for millions of users without losing vital data