AWS outage: Your response to AWS going down shouldn't be multicloud

Commentary: It's convenient to assume multicloud will solve your application resilience woes. Convenient, but wrong. Here's why.

istock-518144944multicloud1.jpg

Image: iStockphoto/Ralwel

You work in enterprise IT, so you're not really prone to join Twitter's "#hugops" crowd when a cloud service goes down. This past week, the US-East region for AWS went down — and hard — leaving hundreds of millions of Netflix, Disney+ and other online properties' customers without service. Those enterprises didn't want hugs. They wanted a fix.

Sadly, multicloud isn't that fix. 

SEE: Hiring Kit: Cloud Engineer (TechRepublic Premium)

As Honeycomb co-founder Charity Majors has stressed, multicloud won't deliver the application resilience you want. And, perhaps even more pertinently to those knee-jerking their way to a multicloud fix for the US-East implosion, there are several imperative steps to take to deliver application resilience before you "fantasize about multicloud for availability," said Gartner analyst Lydia Leong

Like?

Carts before horses

Before you start thinking about multiple clouds, it's best to get the first one right. That's the tl;dr of Leong's argument: "Before you even fantasize about multicloud for availability, you should be multi-AZ in multiple regions, and have maximized your resilience through proper application design/implementation, thoroughly tested through chaos engineering."

Even if you're doing all this, there may still be no easy answers. One person responding to Leong's tweet noted, "The issue is usually state. Replicating your primary database to another region is expensive. Also the [AWS-East] impact seems networking related. Networking faults can in rare cases cause blackholes that are hard to isolate to a single AZ." Some of that may complicate life for the cloud provider, and some for you. 

And all of it falls on IT departments that are stretched thin. As Leong suggested in a follow-on tweet, "It's all too easy to talk about what people should do. Most IT people deal with non-ideal situation[s] where they have inadequate people, skills, time, and money to enact good practices. They usually know they're taking risks. Cost of risk deemed less than cost of mitigating risk."

In a separate blog post, Leong piled on:

Multicloud failover requires that you maintain full portability between two providers, which is a massive burden on your application developers. The basic compute runtime (whether VMs or containers) is not the problem, so OpenShift, Anthos, or other "I can move my containers" solutions won't really help you. The problem is all the differentiators — the different network architectures and features, the different storage capabilities, the proprietary PaaS capabilities, the wildly different security capabilities, etc. Sure, you can run all open source in VMs, but at that point, why are you bothering with the cloud at all?

In other words, before you go multicloud, get your single cloud house in order. Except that you may have to settle for a somewhat ramshackle "house" due to budget and other resource constraints. Oh, and if you magically have all that in order, successfully managing a multicloud environment is not for the faint of heart (or wallet). 

SEE: Multicloud: A cheat sheet (free PDF) (TechRepublic)

This is not to say running in multiple clouds is a bad idea. Most SaaS providers, for example, offer multicloud options because they have to: Customers prefer to run on all sorts of infrastructure clouds. The SaaS providers aren't going to turn those customers away, at least so long as they're running on one of the Big 3 cloud providers (AWS, Microsoft Azure, Google Cloud). Even within a single company, for better or for worse, most companies build applications on multiple clouds, according to a recent O'Reilly survey. That's not surprising, as swipe-credit-card-get-cloud convenience has enabled developers to spin up whichever cloud services they need. 

Coming back to Leong's original point, there's a lot of work to do to enable resilience, and it starts with a single cloud, not several. And, yes, you will run multiple clouds — it's just how IT works. But using multicloud for resilience…? You probably don't want to go there.

Disclosure: I work for MongoDB, and used to work for AWS, but the views expressed herein are mine alone.

Also see