If you want to scale like Apple, you might need to consider Cassandra. Matt Asay explains.
I wish I could tell you how Apple scales iTunes, iMessages, and other wildly popular services. But I can't. This is Apple, after all, and it's not in the habit of spilling secrets.
Sure, Apple finally told the world how it scales Siri (a massive Mesos cluster), but it's been tight-lipped about its use of MongoDB, Hbase, Couchbase, and Cassandra, despite advertising through job postings its use of all four.
And yet there are some things we know about Apple's Cassandra adoption, and it's valuable information for enterprises hoping to emulate Apple's success.
What we know
As a former executive at MongoDB, I know a lot about how Apple uses MongoDB, but I can't talk about it. Suffice it to say that Apple's use of different NoSQL technologies comes down to fitting the right tool for the right job.
In the case of Cassandra, it's apparently the right tool for many jobs, according to current job postings:
- MongoDB - 35 job listings (iTunes, Customer Systems Platform, and others, including several related to geographic information systems, a strength for MongoDB)
- Couchbase - 4 job listings (iTunes Social, though Couchbase was listed as one of several NoSQL skills that could apply)
- Hbase - 33 job listings (Maps, Siri, iAd, iCloud, and more, often related to Hadoop deployments)
- Cassandra - 70 job listings (Maps, iAd, iCloud, iTunes, and more)
At least measured by jobs, Cassandra is Apple's dominant NoSQL database, with double the listings of any other.
What does this translate to in terms of adoption?
A year ago, Apple said that it was running over 75,000 Cassandra nodes, storing more than 10 petabytes of data. At least one cluster was over 1,000 nodes, and Apple regularly gets millions of operations per second (reads/writes) with Cassandra.
It's breathtaking, if you stop to think about this scale.
Facebook database guru Mark Callaghan posits that Apple's Cassandra workload likely relates more to iMessage than iTunes, but whatever the project, it's massive... and it's not uncommon.
According to analyst Curt Monash, there are plenty of petabyte or half-petabyte Cassandra clusters, something that DataStax, the top Cassandra contributor, confirms.
But scale isn't the only thing Cassandra does well, particularly with a little help from DataStax.
What Cassandra does well
There are a variety of reasons developers love Cassandra, including its continuous availability, linear scale performance, operational simplicity, and easy data distribution across multiple data centers and cloud availability zones.
What this means, in the real world, is that Cassandra is good for applications that must always be on (e.g., for online transactions) or high-scale applications (like British Gas using Cassandra to store IoT sensor data and make it available immediately for analysis).
On the topic of analytics, generally not considered a strong suit for any NoSQL database due to the mismarriage of flexible schema (in the database) and an analytics industry set up around expectations of relational schema, DataStax has been augmenting Cassandra.
As Monash points out, by combining Cassandra with Apache Spark in DataStax Enterprise, "Spark offers the potential to do many things [analytics] at in-memory speeds," and with "Spark, the new functions, and general scripting, there are several ways to do low-latency aggregations."
All of this means, returning to Apple, that Cassandra offers Apple the scale and increasingly the analytical horsepower to tackle an ever-expanding array of applications.
More NoSQL at Apple
It's telling, as Sandeep Parikh hints, that Apple ran into enough limitations with traditional relational databases, including the likelihood that they "cost way too much to scale out," such that it actively uses Cassandra, MongoDB, and other NoSQL technologies.
Heck, Apple even went so far as to buy the company behind FoundationDB, a NoSQL database.
As my former MongoDB colleague (and Wall Street analyst) Peter Goldmacher stressed to me in an interview, "It is reasonable to wonder if Apple's software products would have even been possible without NoSQL technologies."
Central among those, at perhaps double the adoption (very roughly extrapolating from job listings), is Cassandra. Most companies don't have Apple's scale, but for those that aspire to them, Cassandra is worth a strong look.