Big Data

10 things you should know about NoSQL databases

The relational database model has prevailed for decades, but a new type of database -- known as NoSQL -- is gaining attention in the enterprise. Here's an overview of its pros and cons.

For a quarter of a century, the relational database (RDBMS) has been the dominant model for database management. But, today, non-relational, "cloud," or "NoSQL" databases are gaining mindshare as an alternative model for database management. In this article, we'll look at the 10 key aspects of these non-relational NoSQL databases: the top five advantages and the top five challenges.

Note: This article is also available as a PDF download.

Five advantages of NoSQL

1: Elastic scaling

For years, database administrators have relied on scale up -- buying bigger servers as database load increases -- rather than scale out -- distributing the database across multiple hosts as load increases. However, as transaction rates and availability requirements increase, and as databases move into the cloud or onto virtualized environments, the economic advantages of scaling out on commodity hardware become irresistible.

RDBMS might not scale out easily on commodity clusters, but the new breed of NoSQL databases are designed to expand transparently to take advantage of new nodes, and they're usually designed with low-cost commodity hardware in mind.

2: Big data

Just as transaction rates have grown out of recognition over the last decade, the volumes of data that are being stored also have increased massively. O'Reilly has cleverly called this the "industrial revolution of data." RDBMS capacity has been growing to match these increases, but as with transaction rates, the constraints of data volumes that can be practically managed by a single RDBMS are becoming intolerable for some enterprises. Today, the volumes of "big data" that can be handled by NoSQL systems, such as Hadoop, outstrip what can be handled by the biggest RDBMS.

3: Goodbye DBAs (see you later?)

Despite the many manageability improvements claimed by RDBMS vendors over the years, high-end RDBMS systems can be maintained only with the assistance of expensive, highly trained DBAs. DBAs are intimately involved in the design, installation, and ongoing tuning of high-end RDBMS systems.

NoSQL databases are generally designed from the ground up to require less management:  automatic repair, data distribution, and simpler data models lead to lower administration and tuning requirements -- in theory. In practice, it's likely that rumors of the DBA's death have been slightly exaggerated. Someone will always be accountable for the performance and availability of any mission-critical data store.

4: Economics

NoSQL databases typically use clusters of cheap commodity servers to manage the exploding data and transaction volumes, while RDBMS tends to rely on expensive proprietary servers and storage systems. The result is that the cost per gigabyte or transaction/second for NoSQL can be many times less than the cost for RDBMS, allowing you to store and process more data at a much lower price point.

5: Flexible data models

Change management is a big headache for large production RDBMS. Even minor changes to the data model of an RDBMS have to be carefully managed and may necessitate downtime or reduced service levels.

NoSQL databases have far more relaxed -- or even nonexistent -- data model restrictions. NoSQL Key Value stores and document databases allow the application to store virtually any structure it wants in a data element. Even the more rigidly defined BigTable-based NoSQL databases (Cassandra, HBase) typically allow new columns to be created without too much fuss.

The result is that application changes and database schema changes do not have to be managed as one complicated change unit. In theory, this will allow applications to iterate faster, though,clearly, there can be undesirable side effects if the application fails to manage data integrity.

Five challenges of NoSQL

The promise of the NoSQL database has generated a lot of enthusiasm, but there are many obstacles to overcome before they can appeal to mainstream enterprises. Here are a few of the top challenges.

1: Maturity

RDBMS systems have been around for a long time. NoSQL advocates will argue that their advancing age is a sign of their obsolescence, but for most CIOs, the maturity of the RDBMS is reassuring. For the most part, RDBMS systems are stable and richly functional. In comparison, most NoSQL alternatives are in pre-production versions with many key features yet to be implemented.

Living on the technological leading edge is an exciting prospect for many developers, but enterprises should approach it with extreme caution.

2: Support

Enterprises want the reassurance that if a key system fails, they will be able to get timely and competent support. All RDBMS vendors go to great lengths to provide a high level of enterprise support.

In contrast, most NoSQL systems are open source projects, and although there are usually one or more firms offering support for each NoSQL database, these companies often are small start-ups without the global reach, support resources, or credibility of an Oracle, Microsoft, or IBM.

3: Analytics and business intelligence

NoSQL databases have evolved to meet the scaling demands of modern Web 2.0 applications. Consequently, most of their feature set is oriented toward the demands of these applications. However, data in an application has value to the business that goes beyond the insert-read-update-delete cycle of a typical Web application. Businesses mine information in corporate databases to improve their efficiency and competitiveness, and business intelligence (BI) is a key IT issue for all medium to large companies.

NoSQL databases offer few facilities for ad-hoc query and analysis. Even a simple query requires significant programming expertise, and commonly used BI tools do not provide connectivity to NoSQL.

Some relief is provided by the emergence of solutions such as HIVE or PIG, which can provide easier access to data held in Hadoop clusters and perhaps eventually, other NoSQL databases. Quest Software has developed a product -- Toad for Cloud Databases -- that can provide ad-hoc query capabilities to a variety of NoSQL databases.

4: Administration

The design goals for NoSQL may be to provide a zero-admin solution, but the current reality falls well short of that goal. NoSQL today requires a lot of skill to install and a lot of effort to maintain.

5: Expertise

There are literally millions of developers throughout the world, and in every business segment, who are familiar with RDBMS concepts and programming. In contrast, almost every NoSQL developer is in a learning mode. This situation will address naturally over time, but for now, it's far easier to find experienced RDBMS programmers or administrators than a NoSQL expert.

Conclusion

NoSQL databases are becoming an increasingly important part of the database landscape, and when used appropriately, can offer real benefits. However, enterprises should proceed with caution with full awareness of the legitimate limitations and issues that are associated with these databases.


About the author

Guy Harrison is the director of research and development at Quest Software. A recognized database expert with more than 20 years of experience in application and database administration, performance tuning, and software development, Guy is the author of several books and many articles on database technologies and a regular speaker at technical conferences.

58 comments
xcelarsolutions
xcelarsolutions

New NoSQL will include support for structured / non estructured and semi structured data. We should expect a lot enterprise level functions on NoSQL Solutions.

mikojava
mikojava

NoSQL can also be performed in-memory such as Oracle Coherence, VMWare Gemfire, Software AG Terracotta and Hazelcast.

mrgreenw
mrgreenw

In general I'd like to comment on the thought that NoSQL databases are new. They are not. Before SQL databases came about there were a number of commercial database managers available. My personal favorite is Supra from Cincom Systems, Inc. This database manager is the third generation derivative born from the same DNA which formed the original non-SQL Total Database Manager from Cincom. Supra is used in production today with thousands of users accessing applications built on Supra databases. It is a high volume, high reliability, and high performance database manager available on MVS, VSE, Open VMS, Linux, Unix and Windows platforms.

J N
J N

That was possibly the most ignorant explanation I've ever read on this topic.

derjanni
derjanni

Thanks for the article. Although I often prefer relational databases like MySQL or Oracle because of data integrity and the comfort to know how it works, there is an article you might be interested in that I found valuable: http://www.kammerath.co.uk/nosql-on-the-spot.html - it's a pretty good comparison of all options.

nosql.io
nosql.io

Great article. I think 2012 is going to be the prime time for NoSQL.

herbmeehan
herbmeehan

In a random email from Dice posting about a video game website looking for a noSQL person. The same posting had the most obscure technologies ever. They are a web company that is looking for a Python / NoSQL expert. Good luck with that one...

martymen
martymen

If you want to see a NoSQL-based system that is mature, stable and highly usable, have a look at EPIC or the V.A.'s VistA. Both are medical software systems based on the MUMPS hierarchical database that has been in use since the late '80s. As pointed out by others, this schema-free system can be used to model just about any kind of database you want - including relational - with huge scalability of data content and user number. It scores on the first nine points above, and an effort is now underway to address number 10 by training a new cadre of MUMPS programmers. Martin Mendelson, MD, PhD

estreur
estreur

I see the same misconception a lot, including in this article. Whether an application/database scales well is determined to a very large extent by - the nature of the IO: the ratio between reads and writes (insert/update/delete) - the amount of correlation between data. (primary) keys lookup vs. aggregation functions over large datasets - the need for data to be accurate and correctness and tolerance if it isn't (yet) correct/up to date and much less by the implementation of the database being Relational, OO, XML, Cloud or Tuple based. A read only (federated) RDBMS, where data is mostly queried by Type/Table and primary key and stale data is not an issue, will scale very well. On the other hand a non-relational, Cloud or NoSQL database in which highly connected data is being updated based on queries with composite filters using inequality conditions on aggregated data, won't scale at all.

AnsuGisalas
AnsuGisalas

Is there a reason why it reads out "No-school", or is it just me? :)

m1scha_m
m1scha_m

Does it support spatial data?

Jaqui
Jaqui

of technology trying to move in a circle. just as thin clients are going back to the old big iron type workstation in most ways, a non relational database engine, without a Structured Query Language to work with, is going back to an older model that was proved to be more costly to work with. costly in development time. maybe the DBAs need to work at improving the design of the databases they are working with, to make the workload easier for the engines to handle.

Tony Hopkinson
Tony Hopkinson

Relational databases and the thinking behind them are endemic in the business world. No point in waffling to your business types about scalability when they can't get at the data, or they have wait for someone like us to write a few million lines of code so Jim's access link and one page crystal report will still work. Elegant, aestheic and wonderfully impractical...

Gis Bun
Gis Bun

"...is gaining attention in the enterprise." Hmmmm. First time I [or a friend of mine] ever heard of this database.

RealGem
RealGem

That's where I would put this stuff. Cloud in general might be at the peak of the hype cycle just before it plummets into the trough of disillusionment, but it sounds like NoSQL is just starting to ride up to the peak. Maybe in a decade, if the technology is still viable and if it has matured, and if it is stable, then maybe I'll think about basing my entire company's operations off of it. Until then, neat idea but gimme a raincheck.

ludicosman
ludicosman

We did developments and production on Google BigTable, next to lots of IBM DB2 usage. So, from our experiences in real life I can add two additional observations: 1) because features like transactional support, referential integrity, etc., are missing, some support has to be build into the application itself, and thus more development, test and maintenance time; 2) small changes might be easy to implement with NoSQL, a major model change is much harder compared to SQL, mainly because of the necessary tight relation between the application and the NoSQL model.

JackOfAllTech
JackOfAllTech

"RDBMS Systems"? I bet you say NIC card too.

LawrenceFine
LawrenceFine

OK, call me out of touch but this is the first I ever heard of a NoSQL database, probably because I don't stay too up to date on open-source projects. And I am betting that many of you CIOs and directors haven't heard of it before either. I think the author's statement about CIO's reluctance to move off of tried and true RDBS without a big name producing a NoSQL product is a huge understatement.

Tony Hopkinson
Tony Hopkinson

Still new to a lot of people. Including you. Total database management is a relational database, NoSQL is an extremely misleading term to describe things like name value stores. Non relational DBMS's. Better still there are SQLesque interfaces for them anyway...

Tony Hopkinson
Tony Hopkinson

How are the Gartner advised non technical management who bought the OODBMS is a replacement for RDBMS bollocks going to cya if you start littering the marketplace with facts and stuff...

ivzae
ivzae

Am I wrong, but was the Object Oriented DBMS not an alternative to the Relational DBMS ? Is it still used ?

gcottman
gcottman

Time for a shameless vendor plug here... :-) Quest Software's Toad for Cloud Databases was inspired by the hilarious Fault Tolerance cartoon at "http://browsertoolkit.com/fault-tolerance.png". Toad for Cloud Databases offers SQL access to NoSQL databases, including SimpleDB, HBase, Azure Table Services and very soon Cassandra. Check it out at "toadforcloud.com".

Tony Hopkinson
Tony Hopkinson

I was playing about with similar concept over ten years ago. All they've done is found a better problem to solve with it, not solved it's problems.

Slartibartfast
Slartibartfast

I got deja vu reading this. It was just like when IBM announced the 4300 mainframes and told everyone you didn't need system programmers any longer ... no, wait that's showing my age... But seriously, isn't this going backwards? When we all went RDBMS from IMS/DB and VSAM and all the other non-relational DB's, didn't we do it for the reasons that are now being touted as the reasons we shouldn't use them?

Tony Hopkinson
Tony Hopkinson

I mean relational databases weren't invented for a laugh. Put all the rules back into the application logic and we are back weher we started, which wasn't all that damn clever. Want to put some rules in then you need to expose the blob contents. Too many people trying to keep their cake and eat it on this one. Worse still the good enough attitude that is prevalent in corporate IT even with competent people, is going to leave us with an abortion. Dual solution is the only practical way to go IF you can describe your data in terms of discrete documents / property bags. Do that then mine it. If scaling is not an issue then don't bother with it, no point spending that much resource just to climb on the hype wagon. Who remembers ObjectStore? :(

robo_dev
robo_dev

the lack of inherent referential integrity could create some really interesting and messy databases, and having transactional support as an external function could create some real scary security issues. Having less structure and fewer constraints makes a big database more flexible and easier/faster to search. Therefore I could see that NoSQL would be very suitable for data mining or fuzzy logic searching (e.g. Google). But for an application that has to perform a specific task, like process your payroll, the more structure and tighter constraints, the better.

jack
jack

I've been trying to get my head around NoSQL for a little while now and this article seems like a very good primer. I think some practical, real world examples would make a nice follow up article.

mwclarke1
mwclarke1

I know that is right, most CIO's, but most any executive in the major fortune companies, other than a few of the high tech ones, have blinders on toward any technological advancement. For one, they would rather spend 10 times the money on expense for maintaining ancient technology than loose their grip on a fraction of capital investment expenditures. Also, at our company, is getting harder to have real qualified people know what they are doing. Starts with if does not come frome and be supported by the big three letter word comapany, then it might as well not exist. We have a hugh IT staff but besides putting in the CD/DVD and running setup.exe, anything else we are calling the vendor. Even if we know how to fix something, if a production issue managers are calling our support services even if that takes a much longer time to repair than if were to dive right in even if one of the few remaining experts know what they are doing, management does not respect or trust anyone anymore it seems. Management calls the shots on what to buy, no one specs equipment, they lat the vendor tell us what we need, many times buying 10 times the hardware that is needed, always ready to waste as much money as needed to get that project done or fix an immediate issue. More layoffs next month though :-(

robo_dev
robo_dev

MySQL, yes, MS SQL, yes. But not NoSql. Since every single IT system that uses a database relies on SQL, a CIO would have to be a foaming-at-the-mouth lunatic to recommend a sea change, especially if it is not promoted by Oracle, Microsoft, or Sybase.

wdavidson
wdavidson

True, and yes they're still being used. But typically where rdbm's couldn't meet the requirements. I appreciate the fact that CIO's don't want to deal with change, and dba's feel they can make an rdbms do what needs to be done, what we can't deny is the amount of data being stored is growing significantly. Coming from an oodbms vendor, who also has a Not Only SQL offering (we don't say no sql), what's driving the early adoption isn't the change in technology, but the change in requirements. 10 years ago we did a petabyte install, but it was like the only one in the world. Now there are many, so some of these changes are out of necessity, not technological trail blazing.

Tony Hopkinson
Tony Hopkinson

In terms of a storage abstraction then yes. But so is a file, or even a filing cabinet... A relational database is storage abstraction and data integrity, that was the point. If all we needed was a slot to store something, no one would have bothered with them in the first place. So if you need relations, then no, they are not an alternative. Coming up with non relatinal description of system (particularly an existing one) means you'll be fighting a couple of decades of "But it's a (relational) database" As anyone who's worked with a poor relational database implemenation knows, you can't put data integrity back in without risking data and function. So you've got make an informed choice, the consequences are inherrent to the model. Attempts to use one and pretend it's the other are going to be expensive, very expensive.

Tony Hopkinson
Tony Hopkinson

I can use it on Xml files. (Linq) I can use it on text Files. (ODBC) I can even use it on the registry. (WMI) Not one of them is relational (well ID / IDREF in Xml at a push). SQL Azure simply provides a familiar way of wording a query. Relational databases are efficient at doing queries because of the embedded relations,not because of the syntax and semantics of SQL. I'm not arguing for keeping relational databases at all times, just against forgetting why they were invented in the first place and that was so you could link orders to customers and then construct a query in some language that just happened to be SQL...

nwallette
nwallette

... should be about as optional as ICMP. Seriously. This is a BIG issue. You don't just trust any developer/PHP-hacker to Do The Right Thing if a transaction fails partway through. This is why there are teams of devoted, specialist programmers out there solving this problem for us, once, properly. So that when we write little one-off applications, we trust the data that comes and goes from the DB. Sounds to me like this is a niche technology to be used in special circumstances. But the reality is it will probably be abused by IT hipsters.. some golden child with way too little sense and way too much authority. And then some poor code-monkey will get the task of reinventing the wheel to turn a non-RDBMS back into a RDBMS without the benefit of extensive testing and all the optimization that went in to the native systems. Swell. ;-)

pgit
pgit

principal Poop... =D

online
online

I remember hearing similar words in the early 80s, when those useless little microcomputers starting knocking at the office doors. Never say never.

Tocsin
Tocsin

"every single IT system that uses a database relies on SQL" ?! Maybe you should ask IBM about that! There was a time before SQL (think hierarchic and network DBMSs), and many CIOs were reluctant to move to an untried and poorly-scaleable RDBMS. I can think of some big-iron systems that _still_ run on a non-SQL DB.

Tony Hopkinson
Tony Hopkinson

Driven by changes in requirements. Indeed. DBAs who think they can make an RDBMS do what needs to be done. I've worked with some of them, they are dumb as the idiots who try to graft relational functionality on an OODBMS.

Realvdude
Realvdude

Reminds of the old adage, using the right tool for the particular job.

Tony Hopkinson
Tony Hopkinson

1 - 3 - 5 Every year you go for short term stuff and then change the three and five year plans to look farsighted...

Jaqui
Jaqui

comes from the only plan for 5 years thing. they need to start planning for 20 years to change that thinking.

Tony Hopkinson
Tony Hopkinson

File based in main frames. HP3000 Image, Paradox even. The document based stuff I'm working on now would lend itself very well to NoSQL Style, indeed Saas, cloud etc is one of the strategic objectives. I don't have a problem with NoSQl as a tech any more than I do the cloud, or windows or nix, or .Net. Just people who forget that all systems are engineering compromises, to get an advantage in one area, you sacrifice something else. Choose the tech based on what you need, not let some vendor choose it for you and then take what you can get out of it. I'm seriously tired of people who seem to glean their technical knowledge from Gartner soundbytes though. Non relational databases scale well because essentially it's a straight storage issue. Better still it's simply a question simply throwing hardware at the problem until you are covered again. As soon as you have a set of tables with it's relations spanning more than one volume and transactionality and multi user, you've got some huge problems and no matter what you do, more hardware simply moves the bottleneck and you've got diminishing returns as well. Clever systems cost up front, simple systems cost later on. Price of doing business. I can't help but feel it's short term cost saving that's driving the decision making as opposed to long terms viability. Can't think where I'd get an off the wall idea like that though....

Jaqui
Jaqui

what is needed is a new relational engine that addresses the scalability issue once and for all. I remember working with dbase when it was a database engine, a non relational non sql one at that.

JackOfAllTech
JackOfAllTech

I was familar with the department... phrase but had never heard the rest of it.

pgit
pgit

"department of redundancy department" is from Firesign Theatre skit. A radio announcement is brought to you by "department of redundancy department and the Natural Guard." As opposed to the "National Guard." I saw the fellow's post mentioning the 'department' and recognized the source. If that fellow comes back to this thread he/she could confirm whether the Firesigns are the source. I'd have to imagine it is, though I rarely come across anyone familiar with them, increasingly so with time. But they are well worth checking out. Dave Ossman's "How Time Flys" is simply amazing. So much of our present day was predicted back then (1970's) utilizing one of the most obtuse forms of humor you'll ever encounter. Most people I tried to introduce FT to couldn't handle it. It sounded "insane." The troupe actually made an album called "not insane," which is the hardest to take of all their works, but nevertheless contains the same deep philosophical and political underpinnings found in all their works. A lot of people know a snippet or two of their work, even though they don't realize where they got it. One of the more common skits I find people are familiar with is "chicken man!" ('he's everywhere-he's everywhere!!") http://www.firesigntheatre.com/index2.html

JackOfAllTech
JackOfAllTech

Please explain, that doesn't make any sense.

pgit
pgit

I did get it, read my post again... "automated ATM teller machine..." So far mine appears to be the only single sentence to use all the words behind the initials. I do hear such things a lot, it doesn't bother me (usually) because I know what the person means. Getting the point across is the goal. Almost nobody uses technically correct English anymore. Not even in writing. ...and the Natural Guard...

JackOfAllTech
JackOfAllTech

ATM = Automated Teller Machine Saying ATM machine = Automated Teller Machine machine NIC = Network Interface Card Saying NIC card = Network Interface Card card Get it now?

pgit
pgit

My bank has installed automated ATM teller machines. Ask your bank if they plan to upgrade the equipment any time soon.

RealGem
RealGem

And remember to never make it the same as your SIN number.

steve
steve

yep there were a few reasonable applications built in Pick and Mumps back in the day... now thats Universe/Unidata/Reality-x/jbase/choose-your-pick-flavour and Cache and the problem with Pick was that you could bypass the data dictionary, or have duplicated definitions, which led to inevitable bit rot. But the query languages were cool. Maybe some of these newer databases could learn some lessons from there. I have had a brief look at couchdb and javascript as a query language? Pleaassse!

robo_dev
robo_dev

In most IT organizations, being on the 'bleeding edge' of new technology only brings risk and headaches. My real comment should have been that 'the vast majority of IT system databases at every company I've ever worked for in my more that 25 years of experience' has used SQL. I admit there were one or two esoteric data analytic systems that used their own voodoo-magic database, but those were far from the norm.

ChuckyKuchs
ChuckyKuchs

What about the systems who leverage the ABL or 4GL languages?

Editor's Picks