This is a guest post from Larry Dignan of TechRepublic’s sister site ZDNet. You can follow Larry on his ZDNet blog Between the Lines, or subscribe to the RSS feed.
When Amazon Web Services’ latest-and arguably most valuable-service is a system that allows you to ship terabytes of data to the cloud via snail mail you just have to chuckle. Yes folks, for all the fancy talk of cloud computing, terabytes-not to mention petabytes-of data and technological advancement the Sneakernet is alive and kicking.
The Sneakernet, where someone puts data on a disk, flash drive etc. and runs it to another computer, is arguably one of our most enduring networks. I still use it all the time. I’m sure I could network my home devices together, but the Sneakernet works just fine.
Multiply the Sneakernet on a grand scale and you understand why Amazon is launching a service called Import/Export. There’s too much data to move to the cloud and not enough bandwidth to get it there quickly. Why take five days to move data-and hog up all your bandwidth-when you can toss it on a storage brick of some sort and just overnight it?
Amazon CTO Werner Vogels explains:
In some ways the computing world has changed dramatically; networks have become ubiquitous and the latency and bandwidth capabilities have improved immensely. Next to this growth in network capabilities we have been able to grow something else to even bigger proportions, namely our datasets. Gigabyte data sets are considered small, terabyte sets are common place, and we see several customers working with petabyte size datasets.
No matter how much we have improved our network throughput in the past 10 years, our datasets have grown faster, and this is likely to be a pattern that will only accelerate in the coming years. While network may improve another other of magnitude in throughput, it is certain that datasets will grow two or more orders of magnitude in the same period of time.
Simply put, if you wanted to move a terabyte data set to EC2 it will take you a while. On an enterprise scale, this data-moving problem is yet another hindrance to cloud computing adoption. Amazon gives the following time frame to shipping a terabyte dataset over the network:
But that doesn’t capture the true costs. Microsoft Research notes that you still have to maintain that network. And there’s labor and support.
Here’s a look at the slightly dated statistics from a 2002 Microsoft Research paper:
Click the image to enlarge.
Microsoft Research’s Jim Gray concluded that Sneakernets are the answer to the above conundrum:
What is the best way to move a terabyte from place to place? The Next Generation Internet (NGI) promised gigabit per second bandwidth desktop-to-desktop by the year 2000. So, if you have the Next Generation Internet, then this transfer is just 8 trillion bits, or about 8,000 seconds – a few hours wait. Unfortunately, most of us are still waiting for the Next Generation Internet – we measure bandwidth among our colleagues at between 1 megabits per second (mbps) and 100 mbps. So, it is takes us days or months to move a terabyte from place to place using the Last Generation Internet.
That passage was written in 2002. And guess what? We’re still waiting. Simply put, the Sneakernet is the most efficient means of moving a terabyte of data around.
Given that fact, Amazon’s Sneakernet, the Import/Export service, may become its most appreciated if not technologically advanced feature. Go figure. In a nutshell, Import/Export allows you to ship data on storage devices with a manifest that explains how and where to load the data and map it to Amazon’s storage system.
Here’s when a move to Import/Export makes sense:
Click the image to enlarge.
Now there are costs. Amazon will charge you $80 per storage device handled and $2.49 per data loading hour. And then there’s the usual storage pricing. But add it up and it’s cheaper per terabyte than waiting a week for a dataset to move.
Will the Sneakernet ever go away? Nope. Gray sums it up:
Until we all have inexpensive end-to-end gigabit speed networks, terascale datasets will have to move over some form of sneaker net. We suspect that by the time the promised end-to-end gigabit (next generation Internet) arrives, we will be moving petabyte scale datasets and so will still need a Sneakernet solution.
Long live the Sneakernet.