How Amazon helped bring NoSQL to the enterprise mainstream

Amazon Dynamo started out as an internal itch that helped kick Amazon kicks its RDBMS habit. Here's how it could do the same elsewhere.

Video: Amazon Web Services ... in less than two minutes

Amazon had a need for a non-relational, highly scalable approach to data, and every enterprise is now swimming in NoSQL as a result. As Amazon CTO Werner Vogels outlined, Amazon's "straining database infrastructure on Oracle led us to evaluate if we could develop a purpose-built database that would support our business needs for the long term." Dynamo was born, initially as a published paper that launched a series of promising NoSQL databases like Apache Cassandra.

In so doing, Amazon arguably began to lay the foundation for a clear and present danger to Oracle's database reign. Dynamo wasn't the first NoSQL database, nor is the subsequent DynamoDB the biggest challenger to Oracle. But by scratching an internal itch for greater scale, Amazon set up the start of a database war that threatens Oracle's long-term dominance.

Scratching the itch

Amazon didn't set out to topple Oracle. Far from it. As Vogels described, Amazon was simply trying to take care of its increasingly demanding database needs, as Oracle wasn't keeping pace.

It all started in 2004 when Amazon was running Oracle's enterprise edition with clustering and replication. We had an advanced team of database administrators and access to top experts within Oracle. We were pushing the limits of what was a leading commercial database at the time and were unable to sustain the availability, scalability and performance needs that our growing Amazon business demanded.

While a traditional enterprise might have kept coloring inside the lines of relational database dogma (strong consistency!), Amazon looked at its actual needs and realized they were out-of-whack with prevailing industry wisdom: "We prioritized focusing on requirements that would support high-scale, mission-critical services like Amazon's shopping cart, and questioned assumptions traditionally held by relational databases such as the requirement for strong consistency....A deep dive on how we were using our existing databases revealed that they were frequently not used for their relational capabilities," Vogels said.

SEE: Big data policy template (Tech Pro Research)

While today this observation might not cause a stir (in large part thanks to the now mainstreaming of NoSQL), at the time it was somewhat revolutionary. The only serious databases were relational databases. Everything else was a toy, right?

As it turned out, the answer was an emphatic "no." Given the need for a big data database, the "toy" was actually the outmoded RDBMS that only understood vertical scale and struggled with the diversity of data that online operations imposed. Amazon published its Dynamo research paper and, after further investigation, built a database service called DynamoDB to expose this new school database to external customers with growing requirements for a database that could handle the scalability, security, durability, and performance of a cloud-based, NoSQL database.

The beginning of an end?

Oracle, for its part, has continued to churn out billions of dollars in revenue during this apparent attack on its database business. Partly, this stems from inertia--databases hold enterprise data and swapping them out is a risky endeavor that few CIOs will tackle without a massive, pressing need. And partly this comes down to fit--most enterprise applications still take advantage of relational data, and Oracle is the RDBMS gold standard.

And yet....

"What started out as an exercise in solving our own needs in a customer obsessed way, turned into a catalyst for a broader industry movement towards non-relational databases, and ultimately, an enabler for a new class of internet-scale applications," wrote Vogels, calling out the growing momentum toward "internet-scale applications."

SEE: Special report: The cloud v. data center decision (free PDF) (TechRepublic)

The nature of enterprise data keeps changing, with a greater percentage tuned to NoSQL each day. Out of habit and self-preservation, DBAs keep turning to the database with which they grew up (Oracle), but over time the nature of internet-enabled applications will shift the focus toward more flexible, next-generation databases.

This shift is assisted, in part, by AWS (as well as Google and Microsoft) removing the "undifferentiated heavy lifting" inherent in database management, as AWS general manager Matt Wood told me in an interview. "RDS was basically a way to remove customers' need to deal with clustering and such on EC2 and have them focused on running MySQL as a managed service," he said.

In making it easy to use databases in the cloud, be they relational or non-relational in nature, AWS is making it that much easier to move away from the more cumbersome Oracle approach. By AWS cloud chief Andy Jassy's reckoning, the "pace [is] quickening" for database migrations from Oracle to AWS, and now stands at 40,000.

This won't happen in a day, or even in a year. It took us several decades to land in the RDBMS dominance we have today. It will take several more to get out. Enterprise data changes slowly, but it is changing, and the cloud is accelerating that change.

Also see

Image: iStockphoto/solarseven