Big Data

10 things you should know about NoSQL databases

The relational database model has prevailed for decades, but a new type of database -- known as NoSQL -- is gaining attention in the enterprise. Here's an overview of its pros and cons.

For a quarter of a century, the relational database (RDBMS) has been the dominant model for database management. But, today, non-relational, "cloud," or "NoSQL" databases are gaining mindshare as an alternative model for database management. In this article, we'll look at the 10 key aspects of these non-relational NoSQL databases: the top five advantages and the top five challenges.

Note: This article is also available as a PDF download.

Five advantages of NoSQL

1: Elastic scaling

For years, database administrators have relied on scale up -- buying bigger servers as database load increases -- rather than scale out -- distributing the database across multiple hosts as load increases. However, as transaction rates and availability requirements increase, and as databases move into the cloud or onto virtualized environments, the economic advantages of scaling out on commodity hardware become irresistible.

RDBMS might not scale out easily on commodity clusters, but the new breed of NoSQL databases are designed to expand transparently to take advantage of new nodes, and they're usually designed with low-cost commodity hardware in mind.

2: Big data

Just as transaction rates have grown out of recognition over the last decade, the volumes of data that are being stored also have increased massively. O'Reilly has cleverly called this the "industrial revolution of data." RDBMS capacity has been growing to match these increases, but as with transaction rates, the constraints of data volumes that can be practically managed by a single RDBMS are becoming intolerable for some enterprises. Today, the volumes of "big data" that can be handled by NoSQL systems, such as Hadoop, outstrip what can be handled by the biggest RDBMS.

3: Goodbye DBAs (see you later?)

Despite the many manageability improvements claimed by RDBMS vendors over the years, high-end RDBMS systems can be maintained only with the assistance of expensive, highly trained DBAs. DBAs are intimately involved in the design, installation, and ongoing tuning of high-end RDBMS systems.

NoSQL databases are generally designed from the ground up to require less management:  automatic repair, data distribution, and simpler data models lead to lower administration and tuning requirements -- in theory. In practice, it's likely that rumors of the DBA's death have been slightly exaggerated. Someone will always be accountable for the performance and availability of any mission-critical data store.

4: Economics

NoSQL databases typically use clusters of cheap commodity servers to manage the exploding data and transaction volumes, while RDBMS tends to rely on expensive proprietary servers and storage systems. The result is that the cost per gigabyte or transaction/second for NoSQL can be many times less than the cost for RDBMS, allowing you to store and process more data at a much lower price point.

5: Flexible data models

Change management is a big headache for large production RDBMS. Even minor changes to the data model of an RDBMS have to be carefully managed and may necessitate downtime or reduced service levels.

NoSQL databases have far more relaxed -- or even nonexistent -- data model restrictions. NoSQL Key Value stores and document databases allow the application to store virtually any structure it wants in a data element. Even the more rigidly defined BigTable-based NoSQL databases (Cassandra, HBase) typically allow new columns to be created without too much fuss.

The result is that application changes and database schema changes do not have to be managed as one complicated change unit. In theory, this will allow applications to iterate faster, though,clearly, there can be undesirable side effects if the application fails to manage data integrity.

Five challenges of NoSQL

The promise of the NoSQL database has generated a lot of enthusiasm, but there are many obstacles to overcome before they can appeal to mainstream enterprises. Here are a few of the top challenges.

1: Maturity

RDBMS systems have been around for a long time. NoSQL advocates will argue that their advancing age is a sign of their obsolescence, but for most CIOs, the maturity of the RDBMS is reassuring. For the most part, RDBMS systems are stable and richly functional. In comparison, most NoSQL alternatives are in pre-production versions with many key features yet to be implemented.

Living on the technological leading edge is an exciting prospect for many developers, but enterprises should approach it with extreme caution.

2: Support

Enterprises want the reassurance that if a key system fails, they will be able to get timely and competent support. All RDBMS vendors go to great lengths to provide a high level of enterprise support.

In contrast, most NoSQL systems are open source projects, and although there are usually one or more firms offering support for each NoSQL database, these companies often are small start-ups without the global reach, support resources, or credibility of an Oracle, Microsoft, or IBM.

3: Analytics and business intelligence

NoSQL databases have evolved to meet the scaling demands of modern Web 2.0 applications. Consequently, most of their feature set is oriented toward the demands of these applications. However, data in an application has value to the business that goes beyond the insert-read-update-delete cycle of a typical Web application. Businesses mine information in corporate databases to improve their efficiency and competitiveness, and business intelligence (BI) is a key IT issue for all medium to large companies.

NoSQL databases offer few facilities for ad-hoc query and analysis. Even a simple query requires significant programming expertise, and commonly used BI tools do not provide connectivity to NoSQL.

Some relief is provided by the emergence of solutions such as HIVE or PIG, which can provide easier access to data held in Hadoop clusters and perhaps eventually, other NoSQL databases. Quest Software has developed a product -- Toad for Cloud Databases -- that can provide ad-hoc query capabilities to a variety of NoSQL databases.

4: Administration

The design goals for NoSQL may be to provide a zero-admin solution, but the current reality falls well short of that goal. NoSQL today requires a lot of skill to install and a lot of effort to maintain.

5: Expertise

There are literally millions of developers throughout the world, and in every business segment, who are familiar with RDBMS concepts and programming. In contrast, almost every NoSQL developer is in a learning mode. This situation will address naturally over time, but for now, it's far easier to find experienced RDBMS programmers or administrators than a NoSQL expert.

Conclusion

NoSQL databases are becoming an increasingly important part of the database landscape, and when used appropriately, can offer real benefits. However, enterprises should proceed with caution with full awareness of the legitimate limitations and issues that are associated with these databases.


About the author

Guy Harrison is the director of research and development at Quest Software. A recognized database expert with more than 20 years of experience in application and database administration, performance tuning, and software development, Guy is the author of several books and many articles on database technologies and a regular speaker at technical conferences.

58 comments
xcelarsolutions
xcelarsolutions

New NoSQL will include support for structured / non estructured and semi structured data. We should expect a lot enterprise level functions on NoSQL Solutions.

mikojava
mikojava

NoSQL can also be performed in-memory such as Oracle Coherence, VMWare Gemfire, Software AG Terracotta and Hazelcast.

mrgreenw
mrgreenw

In general I'd like to comment on the thought that NoSQL databases are new. They are not. Before SQL databases came about there were a number of commercial database managers available. My personal favorite is Supra from Cincom Systems, Inc. This database manager is the third generation derivative born from the same DNA which formed the original non-SQL Total Database Manager from Cincom. Supra is used in production today with thousands of users accessing applications built on Supra databases. It is a high volume, high reliability, and high performance database manager available on MVS, VSE, Open VMS, Linux, Unix and Windows platforms.

J N
J N

That was possibly the most ignorant explanation I've ever read on this topic.

derjanni
derjanni

Thanks for the article. Although I often prefer relational databases like MySQL or Oracle because of data integrity and the comfort to know how it works, there is an article you might be interested in that I found valuable: http://www.kammerath.co.uk/nosql-on-the-spot.html - it's a pretty good comparison of all options.

nosql.io
nosql.io

Great article. I think 2012 is going to be the prime time for NoSQL.

herbmeehan
herbmeehan

In a random email from Dice posting about a video game website looking for a noSQL person. The same posting had the most obscure technologies ever. They are a web company that is looking for a Python / NoSQL expert. Good luck with that one...

martymen
martymen

If you want to see a NoSQL-based system that is mature, stable and highly usable, have a look at EPIC or the V.A.'s VistA. Both are medical software systems based on the MUMPS hierarchical database that has been in use since the late '80s. As pointed out by others, this schema-free system can be used to model just about any kind of database you want - including relational - with huge scalability of data content and user number. It scores on the first nine points above, and an effort is now underway to address number 10 by training a new cadre of MUMPS programmers. Martin Mendelson, MD, PhD

estreur
estreur

I see the same misconception a lot, including in this article. Whether an application/database scales well is determined to a very large extent by - the nature of the IO: the ratio between reads and writes (insert/update/delete) - the amount of correlation between data. (primary) keys lookup vs. aggregation functions over large datasets - the need for data to be accurate and correctness and tolerance if it isn't (yet) correct/up to date and much less by the implementation of the database being Relational, OO, XML, Cloud or Tuple based. A read only (federated) RDBMS, where data is mostly queried by Type/Table and primary key and stale data is not an issue, will scale very well. On the other hand a non-relational, Cloud or NoSQL database in which highly connected data is being updated based on queries with composite filters using inequality conditions on aggregated data, won't scale at all.

AnsuGisalas
AnsuGisalas

Is there a reason why it reads out "No-school", or is it just me? :)

m1scha_m9
m1scha_m9

Does it support spatial data?

Jaqui
Jaqui

of technology trying to move in a circle. just as thin clients are going back to the old big iron type workstation in most ways, a non relational database engine, without a Structured Query Language to work with, is going back to an older model that was proved to be more costly to work with. costly in development time. maybe the DBAs need to work at improving the design of the databases they are working with, to make the workload easier for the engines to handle.

Tony Hopkinson
Tony Hopkinson

Relational databases and the thinking behind them are endemic in the business world. No point in waffling to your business types about scalability when they can't get at the data, or they have wait for someone like us to write a few million lines of code so Jim's access link and one page crystal report will still work. Elegant, aestheic and wonderfully impractical...

Gis Bun
Gis Bun

"...is gaining attention in the enterprise." Hmmmm. First time I [or a friend of mine] ever heard of this database.

RealGem
RealGem

That's where I would put this stuff. Cloud in general might be at the peak of the hype cycle just before it plummets into the trough of disillusionment, but it sounds like NoSQL is just starting to ride up to the peak. Maybe in a decade, if the technology is still viable and if it has matured, and if it is stable, then maybe I'll think about basing my entire company's operations off of it. Until then, neat idea but gimme a raincheck.

ludicosman
ludicosman

We did developments and production on Google BigTable, next to lots of IBM DB2 usage. So, from our experiences in real life I can add two additional observations: 1) because features like transactional support, referential integrity, etc., are missing, some support has to be build into the application itself, and thus more development, test and maintenance time; 2) small changes might be easy to implement with NoSQL, a major model change is much harder compared to SQL, mainly because of the necessary tight relation between the application and the NoSQL model.

JackOfAllTech
JackOfAllTech

"RDBMS Systems"? I bet you say NIC card too.

LawrenceFine
LawrenceFine

OK, call me out of touch but this is the first I ever heard of a NoSQL database, probably because I don't stay too up to date on open-source projects. And I am betting that many of you CIOs and directors haven't heard of it before either. I think the author's statement about CIO's reluctance to move off of tried and true RDBS without a big name producing a NoSQL product is a huge understatement.

Editor's Picks