As the competition in the Cloud space heats up, we are seeing more and more overlap of the services offered by major providers. One service that I find particularly useful is semi-structured cloud storage, where you can use a service call to upload some data that may or may not be structured, and that data becomes instantly available for everyone else that has access to the storage space. This type of service allows for quick and simple sharing of data in distributed applications, or even between multiple applications, with the added benefit of having someone else responsible for replicating and backing it up.

Two of the most interesting services of this type are Amazon’s SimpleDB and Microsoft’s Azure Table Storage. Both offer very simple API’s for all operations, from inserting data to executing more complex queries. They also have the same basic premises: low-administration, fast access to distributed data, scalability, availability, and a pay-as-you-go model. How then do these two services stack up?

Pricing

Amazon’s SimpleDB has a “Free Tier” where anyone can have up to 25 machine hours per month, 1 GB of storage, and 1 GB of outbound data transfer for free. This is enough for most simple experiments, and allows anyone to easily test their services. After this free tier, the pricing varies according to the data center where your data is going to be hosted. For the US-EAST datacenter, the pricing is as follows:

  • $0.14 per machine hour consumed. Machine hours are calculated based on the amount of machine capacity used to complete a request of any kind, normalized to the hourly capacity of a 2007 1.7 GHz Xeon processor.
  • $0.12 per GB of outbound data transfer, for the first 10 TB. As transfer volumes grow, this price decreases.
  • $0.25 per GB of storage per month. There is also a 45-byte overhead on each item uploaded.

They offer a complex calculator so that anyone can estimate their costs with the system. By running some simple numbers, several different usage cases can be simulated. Most usage scenarios would actually run under $ 100 / month, a reasonable price.

Microsoft’s Azure Storage has a slightly different pricing scheme. They have no “Free Tier”, though they do offer a free trial, and if your company is a start-up, it can probably enroll in Microsoft’s BizSpark program and get free credits. They have a simple calculator for cost estimation as well. The basic pricing is as follows:

  • $0.14 per GB of storage per month. No info on storage overheads. Storage is measured as the average over a month, so if you store 1 GB for the first half of the month and grow to 3 GB on the second half, you’ll only pay for 2 GB.
  • $0.01 per each 10K data operations
  • $0.15 per GB of outbound data for North America and Europe; $0.20 per GB for Asia-Pacific

Pricing will vary a lot depending on the application type. In my case, it was basically the same for both providers. Azure has cheaper storage and no extra cost if your storage operations are more complex, but data transfer is more expensive.

Performance

To test performance, I ran a few tests of my own. First, I set up a Rackspace CloudServer and simulated 20,000 operations against Azure Storage. I also ran the test for 5 and 10 threads, each running the same 20,000 operations concurrently. After this, I set up an Amazon EC2 instance with the same specs as the Rackspace server and ran the same tests against SimpleDB. Finally, I ran the single instance test accessing SimpleDB from Rackspace. The following table has the average access times (in milliseconds) for each test case.

1 instance

5 instances

10 instances

Rackspace to Azure

11.78112395

12.53629188

15.3170274

EC2 to SimpleDB

15.07166852

41.71508192

65.66343936

Rackspace to SimpleDB

44.2511335

Not surprisingly, the access to SimpleDB is much faster from inside Amazon’s network than from outside (about three times as fast). Searching the web can lead to some references about how Amazon has optimized its internal networks so that internal access to the different services is much faster than outside access. To me, it seems as a form of lock-in, not a desirable feature, but if you are willing to live with this, it can be a benefit, as they don’t charge for data transfer inside their own network.

What was surprising to me was the difference in scalability between SimpleDB and Azure Storage. While running multiple concurrent accesses against Azure didn’t result in a great increase in access time, the increase in access time that happened with SimpleDB was huge. While 65 milliseconds may seem like a very small time, for some applications it may be too much.

Conclusions

My recommendations would be to run your tests based on your expected workload. One thing to take into account is that Amazon today has a lot more users than Microsoft, so that can eventually make a difference. I’ll probably have to monitor performance and repeat these tests in the future. Amazon also has some additional benefits that may also be interesting: a simpler management console and cheaper prices to start. In my case, I was looking for response times under 30 milliseconds for concurrent access, so that made Azure the clear winner.