As data changes, databases must change, too. Matt Asay explains how NoSQL is changing data forever.
You really can't manage today's data with a relational database. At least, you shouldn't.
Ask an experienced DBA about NoSQL databases like MongoDB or Cassandra and often they'll tell you that "you don't need to use NoSQL." They'll either invent RDBMS workarounds to force the square RDBMS peg into the round NoSQL hole, or they'll try to rearchitect an RDBMS to embrace NoSQL concepts.
The latter approach can be successful for some applications, but for an increasing array of modern workloads, you're just going to have to get used to NoSQL, as University of Berkeley professor Michael Franklin insists.
A database geek himself, Franklin notes that "things have really fundamentally changed." In fact, "Probably things have changed the most since the adoption of the relational model back in '80s." That's a big shift, and it's all being driven by big data.
It's not really about "big"
"Big data" is one of the most unfortunate terms our industry has invented, because it has fixated our attention on sheer volume of data, despite that being perhaps the least interesting aspect of modern data.
As Franklin posits,
"[W]hat's really fundamentally different about this new-generation data management isn't really isn't just scalability, but it's really flexibility. If you look at the ability to store data first and then impose structure on it later--sometimes this is called schema on read or schema on need--that's a complete game changer."
The relational model introduced a "game changer" of similar magnitude back in the 1980s. Prior to RDBMS, a developer would need to both structure their data and know the queries in advance. With the advent of RDMBS, a developer still had to impose structure on her data, but she could figure out how to query it later.
This was revolutionary, and it served the industry well for decades. That is, until big data came along.
Take Craigslist as an example.
Craigslist happily stored its data in MySQL. As it scaled, the rigid structure of its schema created major headaches:
"[T]he structure of their data had changed several times over the years. This alone made any change to the database schema a costly, prolonged nightmare, as changes often meant downtime... And if database alterations were a challenge, just imagine how difficult introducing entirely new features became? What's more, each change to the live database schema required a corresponding change to the entire archive--a process that took months every time. And during these updates, the archival process had to be put on hold, which meant stale data piled up in the live databases, slowing down the site's performance."
In the past, it may have been acceptable to wait three months to introduce new functionality. But today, enterprises must iterate quickly to remain competitive. The rigid schema of the RDMBS, once an advance over previous generations of databases, had become a problem.
In the case of Craigslist, they moved to a NoSQL database (MongoDB), which allowed them to iterate quickly, with changes taking effect in minutes (or seconds), not months.
Freedom to grow
It is this iteration that is the heart of big data. No, it's not perfect for every sort of application, but there is a new generation of data that demands NoSQL; that is significantly better with NoSQL.
According to Franklin, "There's a valuable set of applications that [doesn't] require the guarantees that traditional database systems were trying so hard to preserve: consistency, concurrency control, recovery, these sorts of things."
While a traditional DBA's head starts to explode at the mere thought of losing ACID, the reality is that "there are valuable--not just interesting but also financially valuable--applications for which you don't need those guarantees."
Apparently, the market agrees, as NoSQL databases have become hugely popular (with DB-Engines measuring popularity across a number of data points, including job listings, LinkedIn profile mentions, and more):
While it's clear that the top RDBMS still dominate industry usage, NoSQL is on the rise. When I started tracking this just two years ago, only MongoDB made the top 10, and Postgres and DB2 were still ahead of it. Now Cassandra (supported by DataStax) and Redis both make the grade, and the RDBMS giants have been falling.
This isn't because they're not good databases. They are. But the market increasingly depends upon messy, unstructured data and needs to iterate quickly to remain competitive. For this brave new world of big data, NoSQL is the right answer.