Like moving house, packing up your data and shifting it from one cloud to another can be a stressful experience.
But not as fraught as you might think, says Jyrki Pulliainen, who as a software engineer at music streaming service Spotify has just overseen the move of about 1.5 billion files from Amazon Web Services (AWS) to Google Cloud Storage.
While Spotify stumbled over a few gotchas when shifting the 3PB of data, the process was completed in the past couple of weeks and without major setbacks.
Here’s what Spotify learned from moving from AWS Simple Storage Service (S3) to Google Cloud Storage, as part of its wider migration to Google Cloud Platform.
1. Check if your infrastructure is tightly coupled to your old cloud
Spotify baked the names of the AWS S3 storage buckets into the names of the keys for its data caches, in order to help with A/B testing.
“It turns out this is a bad idea when you migrate the bucket,” said Pulliainen, who added that Spotify had to pause the migration while it updated its cache keys.
2. Watch out for unexpected latency spikes
As part of the move, Spotify shifted data from an AWS datacenter in Ireland to a facility in Google’s US Central region, due to most of Spotify’s customers now residing in North America.
Following the switch, Spotify was surprised to see higher than expected latency when serving data to some US customers.
“The lesson learned here is that our private capacity points had IP addresses registered to Stockholm, so we were ping-ponging to Europe and ping-ponging back to US Central to fetch the files,” he said, speaking at the Google Cloud Platform Next conference in London.
“So if you do these kind of moves, remember to check where your IP addresses are registered and change them accordingly.”
3. Check how well your new cloud provider integrates with your CDN
Spotify uses three major content delivery network (CDN) providers, Fastly, Akamai and Verizon, to cache music locally for users across the globe.
While Spotify’s CDN partners offer Google Cloud Storage (GCS) support ‘out of the box’, there were some roadblocks.
A negative for Spotify is that its CDN providers “don’t really support” OAuth, the token-based authentication system that allows third-party services to access online accounts without knowing the user’s password.
“We would have liked to have had OAuth support. We had to create artificial Google accounts for our CDN providers. It’s a hassle for us, for our IT department, for our security and we get less fine-grained access control.”
4. Check what you’ll lose during the move
Each cloud provider has strengths and weaknesses, so you should expect some gaps. Check which features from your existing cloud provider are missing or inferior to your previous choice.
“Now that we’ve done the move, we are generally really happy but we have some wishes for the future.
“We would love to see cross-continental replication, like the fact we now run in a US Central regional bucket, that causes a slight latency for our European users.
“We ideally would like to have another bucket in Asia. At the moment, the GCS does not allow cross-continental replication.
“We’re also seeing slightly lower cold read latencies than with S3, the good news is that Google has improvements upcoming and we’ve worked with them to shear off 30ms.
Pulliainen added that these latencies should be invisible to users, due to Spotify’s technical architecture.
Aspects of his specific complaints may have been addressed by Google with the introduction of the new Coldline and Multi-regional storage tiers this week, but his point of looking for gaps still stands.
5. Check how simple the transfer process will be
One of the most obvious things for firms to check up front are which tools are available to copy data between their old and new cloud provider, and how good these tools are.
Spotify moved some 1.5bn files — covering music, cover images and metadata — totalling about 3PB of data between S3 and Google Cloud Storage.
While moving this much data understandably took a couple of weeks, the actual process was straightforward, said Pulliainen, who described moving the 1.5bn files as nothing more than a matter of “clicking around a UI for a moment” and called the overall experience “super simple”.
Out of the 1.5bn files, 144 had transfer failures, but Pulliainen said it took Google less than one day to track down those that were missing.
- Google Cloud Platform: The smart person’s guide (TechRepublic)
- Google cuts cloud storage pricing, but will it be enough for the enterprise? (TechRepublic)
- Google refreshes its cloud storage, slashes prices to a new low (ZDNet)
- Google admits original enterprise cloud strategy was wrong, why it’s gone in a different direction (TechRepublic)
- Google Cloud Platform breaks through with big enterprises, signs up Disney and others (ZDNet)
- OneDrive, Dropbox, Google Drive and Box: Which cloud storage service is right for you? (CNET)