How the cloud fits into the big data technology stack

Nick Hardiman peers into the technology stack that makes big data work.


What is the technology stack that makes big data go? And how does it work with cloud computing? Here’s how three successful companies – SoftLayer, Cloudant and Rosetta Stone – work at different layers of the big data technology stack.

Bare metal is the foundation of the big data technology stack

The foundation of a big data processing cluster is made of machines. Like relational data clusters, these machines usually have plenty of memory, CPU and storage. However, big data machines don’t have to be scaled up – they can be scaled out by adding more machines. The ability to scale out makes them a good match for cloud computing.

SoftLayer is a hardware IaaS provider – it does not deal with NoSQL directly but does deliver the clusters required to run them. Nathan Day, chief scientist at Softlayer, said they can “deliver a cluster of servers for things like the NoSQL solutions, so with things like Riak and Mongo, a customer can come and say 'I want my own cluster of NoSQL servers. I want three of them in Amsterdam. I want three of them in Singapore.'"

These machines are often physical rather than virtual because bare metal makes performance less painful – with the unfortunate side effect of making the bill more painful. Day commented on bare metal versus virtual machines. “We did a comparison test between deploying Mongo on bare metal and doing a deployment on a cloud…It’s very consistent on bare metal, as you’d expect, because you’re single tenant running on your hardware – it behaved very predictably. In a public virtual machine cloud, where you can’t control aspects of storage and even CPU and RAM access, the results varied wildly”.

The database service in the middle

Building this layer requires database expertise. The designers must answer tricky technical questions such as when to shard, how much memory is enough, and what the difference is between Hadoop and Cassandra.

Cloudant, the managed NoSQL provider, provides DBaaS to its customers, placing this DBaaS layer on platforms from IaaS providers like RackSpace, Azure and Joyent. If customers want cheap DBaaS, Cloudant can supply them with an AWS virtual cluster. If customers prefer hardware, Cloudant can supply a SoftLayer cluster like Day described. Cloudant CEO Derek Schoettle said, “Whereas they provide IaaS and we provide DBaaS, our joint customers benefit from a tight coupling of our respective services. As such, some of Cloudant’s biggest and most important accounts are on SoftLayer infrastructure.”

The top layer business applications

Business applications form the top layer of the technology stack – the one that a customer interacts with. This is the layer where technology produces business value.

Mike Broberg, Marketing Communications Manager at Cloudant, talked about one of their customers: Rosetta Stone, a global company providing learning technology. Rosetta Stone manage social networking data for their customers, and Cloudant manages NoSQL for Rosetta Stone. “Data for the online education/social networking portion of their software is stored on Cloudant running in different data center locations around the world. Requests are routed to the closest copy of a user's data based on his or her geographical location. The idea is to move the data closer to the users who are accessing and changing it to reduce latency. That data is also replicated to other sites as it changes.”

Why would a clever business like Rosetta Stone, skilled in many technical areas, offload the lower layers of the technical stack to a third party? Software development is one of the most complex jobs in the world. Writing complex code takes so much concentration that everything else becomes an unwanted overhead (yes, this is why game engine programmers don’t wash).

Technical work that distracts developers from coding - like release control, testing, and code promotion – is either automated or offloaded. Even holistic DevOps teams are relieved to offload anything to do with the infrastructure, including data storage. A coder wants to plug his code into a database API and not get distracted with the basics of keeping the infrastructure lights on, let alone the complex tasks of data store scaling, backup, and performance.

Stick with what you know

When you look at the big picture, the new big data technology stack is much like the old structured data stack. Take some hardware building blocks and some software building blocks, snap them together and construct a new data storage system.

Each layer of the big data technology stack takes a different kind of expertise. The cloud world makes it easy for an enterprise to rent expertise from others and concentrate on what they do best.


Nick Hardiman builds and maintains the infrastructure required to run Internet services. Nick deals with the lower layers of the Internet - the machines, networks, operating systems, and applications. Nick's job stops there, and he hands over to the ...


Editor's Picks