Outage

How companies can prepare for cloud outages

When your cloud service goes down, you learn a hard lesson pretty quickly. Here's what smart companies do to prepare for an outage.

When Amazon Web Services suffered its high-profile cloud outage in April, the company was not prepared. Other big-name properties like Reddit and Foursquare were caught off guard as well.

While the cloud is being touted as this magical commodity that lets you store your data where you want without all the hassle, the fact is you lose a lot in control and security. You learn this lesson pretty quickly when your cloud service goes down. Smart IT managers are prepared for possible outages.

Stan Klimoff, Director of Cloud Services for Grid Dynamics, talks about how companies should prepare for cloud outages.

1. Institute disaster drills. This gives a good understanding of the weak points of the system, as well as prepares the staff to face a real disaster. 2. Design for failure. Follow the rule that everything that can fail, will fail. This will help your team better prepare for possible outages. Designing for failure is an expensive engineering practice, but it definitely pays off in the end. 3. Have a "Plan B." You have to be able to take your stuff and move away from an ISP within hours. Design your system to avoid vendor lock-ins. 4. Know your Service Level Agreements. The recent cloud outage that Amazon experienced was actually caused by the EBS (Elastic Block Store) service, which has the stated availability on par with the one of a local disk. No one should trust a single local disk with a critical database.

About

Toni Bowers is Managing Editor of TechRepublic and is the award-winning blogger of the Career Management blog. She has edited newsletters, books, and web sites pertaining to software, IT career, and IT management issues.

24 comments
tbmay
tbmay

...can always be set up. But I still think there's good reason to not get into the cloud. Actually there are 3 reason. Security, reliability, AND too much reliance on providers. (Both cloud vendors and ISP's.) Look at the ever changing pricing with mobile plans as an indicator of the risk there. If we ever get to a point where everybody is doing all their productivity work in the cloud, there would be very little to stop a few providers from effectively owning them by virtue of what they could charge. That's a big picture thought, but I see it as a valid one with business after business diving in to the cloud.

abhinavkaiser
abhinavkaiser

Main infrastructure that make clouds work are ISP and servers. Companies relying on cloud must have redundant ISP, waiting in the background to take over. Or two ISPs, maybe 3 could be used to share load, and when one goes down, others can takeover. Coming to servers, the same logic must be applied. Have redundant servers for automatic failovers. Real time data replication necessary. These failovers come at a cost, but the cost is a fraction compared to going for non-cloud computing.

brian
brian

It sounded so neat, get soemone else to look after your IT. But the Busines neeeded to understand and the idea was so fantastic, just think about all the money that we will save............... drooled from the lips of the Accounts, the Senior management Chain and all those guys who could smell a fast buck and thought that most of those bucks could find their way into their paycheck, directly or indirectly. No one was responsible for it all and all sorts of strange things happened, undocumented mission critical software just fell out of the sky. Owner has to look after his Business. I have never been in favour of the Cloud ever since, some 25 years ago, people were trying to do a Cloud Thing to me and the Business that I was engaged in. Other groups of interests got together and found some budget to do a little cloud thingy of their own, and on a critical bid, someone pressed the print key and a strange silence enveloped the whole team - yes, no one had addressed size adequately. Every business needs to understand its Critical Mission needs and its Disaster Recovery needs.and the liabilities of their business to the Cloud or whatever you wish to call it. I would start from the position that no Cloud could be safe enough to entrust my business to. Never. If someone wants to make me a proposition I would need to know a lot about their proposition, in great detail, before I would even give it time to consider. If I sound like someone who has been burned, you would be right, but that is the attitude that we should all take to a new proposition that sounds too good to be true. Your Business depends uopon you and your team making the right decisions FOR YOUR BUSINESS. Sleepy

ronan
ronan

Funny That Amazons Cloud went bottom up in europe west on sunday evening too, due to a claimed lighting strike and power loss, our services took 48 hours to recover, luckily also that our applications developer was still working closely with us and we got our stuff back up and running after 48 hours, otherwise i fear it would have been much longer, AWS is still reporting issues almost 62 hours later on status.aws.amazon.com/eurpoe means ther are still major issues for many customers

tbmay
tbmay

Those questions just won't go away. I know people, businesses even, who rely on Google Apps. I use it from time to time myself for documents that don't have any sensitivity or privacy concerns. If the world sees it - no harm no foul. Also, it's not a big deal if my WAN link is down, or Google is working on something, or the stars line up wrong, or whatever. But, I just can't warm up to the idea of anything sensitive being on " some server somewhere" where I don't even know who works there. Paranoid? Maybe. But I'd bet dollars to donuts someone is going to get burned one day.

vjain
vjain

For data intensive and critical applications, like your file server, the cloud has to be hybrid. Having a on-premise copy of your files on commodity hardware, automatically synchronized bi-directionnally with the cloud, addresses the problems with pure cloud solutions. Technical issues like latency, group-offline access, and last but not the least - the peace of mind, are best addressed with the Hybrid approach. I truly believe that the ground realities of the cloud, require it to be hybrid.

Tony Hopkinson
Tony Hopkinson

In fact the cloud is essentially a disaster proof architecture, well at least as far as tech and money will stretch. It's one of the things you should be buying and should be explicit in the SLA. Nicely timed post, as I'm pretty sure people aren't considering these issues, nor am I convinced that providers are mentioning them. There can't be a one size fits all aproach to this. Get your data back, so you can switch to another ISP presumes they provide that same service with the same tech. Standard stuff like email, will probably be okay, heavily customised business software like CRMs, purchasing, ledgers, payroll, etc.... Part of the DR you need to do is to make sure there is a viable alternative to go to, if there isn't you just took on a lot more risk. As builder of the above sort of applications, I wouldn't be taking that one for granted. Have to remember the cloud is different platform, and that the offerings that are out there now are different ones, migrating from one to the other, is as problematic an exercise, as say switching operating systems, to or from Exchange or Oracle,from Fred's Accounting a package to Jim's. I've done many migrations and ports in my career, there's point where you have to choose between amending the way you operate or amendng the software. There's also one where you have to say we can't get this data in the new system, and we'll have to lose it or re-enter it. The only way to discover how much is to do it and that's expensive.

HAL 9000
HAL 9000

To cut costs and make life so much easier for companies so that they don't have to do this stuff to begin with? Are the companies who use "The Cloud" not supposed to be saving money while not having any of the expense of requiring Disaster Plans, Backups and Single Point of Failure issues? If they need to expend funds for this what is the benefit of moving away from Local Hardware that they have the control of and pay to maintain? Just my thoughts as the supposed Benefits of "The Cloud" is supposed to do away with all of this even if to me it just sounds like a Enlargement of Dumb Terminals that we used to use when Computers where first adopted by business and proved a failure then as well. ;) Col

Spitfire_Sysop
Spitfire_Sysop

"The Cloud" should only be used as a backup for your own private cloud. If there is a problem with your shop, like a hurricane, then you have uninterrupted service with a live backup.

Tony Hopkinson
Tony Hopkinson

kit and people, we'll do it instead. If you keep your onsite stuff, most of the argument for doing it at all, goes completely in the bin except for those requirements where the cloud is the best solution platform wise. Those are a bit thinner on the ground. Sort of like using public transport as much as you can, but keeping a car for emergencies.

pfeiffep
pfeiffep

for individual companies certainly is not a responsibility of the cloud vendor. Mission critical data should not be uniquely cloud based. Cloud technology is certainly convenient, but NOT bullet proof.

ananthap
ananthap

Users shouldn't have to bother with things like availability. Disaster recovery (automatic failovers) and a workable plan B should be built into all cloud contracts.

Ron_007
Ron_007

To off load from your location to the internet. Off load things like hardware, software, manpower. If you think it is going to be cheaper, guess again. You are simply moving from in-house to "out-house" capital purchases. You are paying someone else to buy and maintain your hardware and operating system. They may be able to take advantage of savings based on scale of their operation compared to you, but they won't give you the all of that advantage! They are companies that have to make a profit too. You are paying for that profit. People forget that the cloud is just another computing platform. It is NOT a disaster-proof, automatic failover, magic wand. If you don't do your homework to learn about cloud computing in general and the specific system you using, if YOU don't design it to be "disaster proof", if you believe everything a SALESMAN tells you, if you don't read the "find print" (before signing) your cloud will rain on you! The Amazon failure could have been avoided or minimized by properly architecting the use of the cloud rather than depending on Amazon to do it all. This article describes the specific failure: http://www.channelregister.co.uk/2011/04/27/rackspace_on_amazon_ec2_outage/ . The article also mentions that a lot of companies did not even get refunds for the downtime because they did not read the fine print and did not design their cloud properly!

gechurch
gechurch

I'm very interested in this option. I definitely see the cloud as having potential to improve DR drastically. Does anyone have experience in doing this, or know of cloud providers with solutions designed as a mirror? I consult for a bunch of small companies, mostly running Small Business Server. I'd be most interested in having Exchange mirror elsewhere, and same for files. I'm not too interested in having apps mirrored - just in getting quick access to files and email if the worst happened. Anyone got experience or suggestions?

Tony Hopkinson
Tony Hopkinson

cloud to will be heavily dependant on it already, and so as such should have a provision for it's loss now. They'll lose more functionality when it goes down so that makes it a higher impact, but in terms of eejit digging their cable in half, no increase in the incidence of risk, with the possible exception of brown out on their bandwidth. If you go cloud big style of what use to you is immediate access to your data media locally. You've no machine to put it on, no software to access it, no people to do it. That's the cost saving the cloud boys are selling this on. Using the cloud as part of your DR plan isn't on the menu. The main reason I like this post is I'd be surprised if half the sensible target market, even have a DR plan. Well over half of the ones who dop have them will be useless, or broken. How many times have you heard some one find out that their backup is useless during the restore process. Usually the need to restore was down to a next amusing anecdote about a "dumb user" as well. Those people are probably at less risk in the cloud. I have some very serious concerns with the cloud instead approach, but some big hairy bloke leaning on his pneumatic drill a bit to hard isnlt high on my list. A lot of these guys could muddle through critical stuff on their PDA. Medium to large are more likely to offer a cloud than be in one.

HAL 9000
HAL 9000

The Cloud is not a solution as it has been portrayed particularly by the Mass Media, it's all Marketing Hype with no Fact behind it. The only possible cost saving is in initial purchase but with the Monthly Subscription that soon disappears if it ever even existed and the companies who Embrace the Cloud end up paying even more than they did previously with no control over their Data and no real expatiation that their Cloud Provider even knows what it is that they are supposed to be doing to protect their Customers. Amazon was a perfect example of a company who should have known better who most likely made a Commercial Decision based on what their Accountant told them and failed miserably to maintain their systems in even the most basic Security Configuration. What the above says to me is that you need to have an even better Disaster Recovery Solution and an alternative hardware/net provider that you can switch to within minutes when your Primary provider disappears for whatever reason with your current Data who either their Accountant or Receiver will most likely sell onto your competition at their earliest convenience. Seems to me that this is considerably more expensive than maintaining their own Hardware solution. There are no initial Cost Savings going to the Cloud and way too many reasons not to go anywhere near any Public Cloud in any way for anything for business. It has been shown that these Public Cloud Providers can not guarantee even the most basic Security that their customers rely upon to remain in business let alone offer [b]Superior Protection[/b] to what the Company could provide themselves. To me the only way forward with any Public Cloud is for completely no Security and allowing every Boy and their Dog access to everything. That way no one would want to cripple the Public Cloud as they would gain no benefit. Hardly something to boast about in Sales Literature though is it. :D Col

gechurch
gechurch

I'm in the exact same position - I consult for small businesses, and am currently looking at using the cloud for business continuity/disaster recovery, and like you Exchange is my main concern. Google have the exact offering I am wanting for Exchange Google Message Continuity. Its aimed at businesses with an in-house Exchange server. The idea is all mail goes through Postini (Google's spam filter; you point your MX record here permanently), which then forwards all non-spam mail to both Gmail and your local Exchange server. If Exchange goes down, mail continues to flow to Gmail without needing MX record changes and propagation, and you still have access to historical mail. Once Exchange is back up they have a feature to sync mail sent using Gmail back to Exchange. It costs $25/user per year. You may want to add Message Discovery to that if you want protection as well as continuity. That adds $13/user per year. I was all set to begin recommending this product to my clients, but was recently recommended Webroot. They offer pretty much the exact same product, but it sounds like their anti-spam might be a bit better. I haven't traditionally been 100% happy with Postini using Gmail - I have had the odd false positive (although recently it's seemed better). Their offering is at http://www.webroot.com/En_US/brochures/email-business-continuity.html. I believe pricing is about the same as Google.

Tony Hopkinson
Tony Hopkinson

call IT proficient and don't make use of the control they they have. Regularly get taken out by viruses, kit failures, patch screw ups and so forth. A truly shocking number don't have usable backups, and then call a "consultant" out to cross their fingers for them. Basically the cloud vendor is more trustworthy than them. At least if they do some DR, they at least they'll know how much they need, whether they choose cloud or not.

JamesRL
JamesRL

We have some clients who have enough work in their organizations to keep an IT technician busy among their several sites(who are often served by one server in their biggest office). For those ones its a numbers game. The ones that can get the biggest benefit are the ones in the middle of nowhere, who are 4 or 5 hours away from getting a hardware tech out to their site if the server crashes and the RAID has to be rebuilt. They don't have a dedicated tech. The one person they trained to do the backups daily, gets busy, or forgets to read the logs (to see the failures or bad tapes). They lose the patch CDs. They stop backing up when the trained person goes on vacation (often for two weeks). They benefit greatly from having everything run remotely, where backup and disaster recovery are someone else's problem.

Tony Hopkinson
Tony Hopkinson

Most businesses I've worked at have had two links, even in pre-internet days, with two what used to be called KiloStream cicuits, high speed dedicated phone lines basically. Course both boxes to do that were on the same rack shelf... :( Given you are usisng someone big enough cloud wise to have their own major failover provison (which is really the essence of the cloud), connectivity shouldn't be a big thing. The one that hits them is the trust issue, because if you say right I'm keeping my kit to run critical functions X and Y, keeping your non-critical ones as well is not going to stack up as a saving. Even the ones that do can go for it, will be reluctant to do so, without recooping the loss they'll make on the kit they have. Might wit for it to depreciate, but these are the guys still using MSDE, with VB6 on Win98... I'd be trying to sell cloud as value add option as much if not more than a cost saver mysellf.

JamesRL
JamesRL

My company has been selling solutions that have been "cloud" based before the word was so popular. Its now by far the most common choice. We do offer customers a choice, if you want to have the hardware on your site, feel free, but you buy the hardware from us, and you probably want us to do the maintenance. Someone on site from your staff have to do the backups, including swapping tapes. Or the popular choice is to go to a remote model. We have two datacenters that replicate with each other, so if an act of God hits one of them, you can probably connect to the other, infact it is transparent to the user, so you should even realize it. Each datacenter has at least 2 ISPs, so if one of them has an act of God..... But we also recommend that the dealer get two ISPs. The second could be a "minimal" connection, where you ban internet surfing or running huge reports in the middle of the day, if you have to use it at all. And you have to ensure that the 2 ISPs aren't using a common carrier - if they both lease lines from the same phone company, then you really don't have any redundancy.

Ron_007
Ron_007

Think again, there certainly is an often large savings going to the cloud. At least it appears that way. There is no initial capital expenses for Hardware, software, and physical location. For large companies running large applications that can be a huge expense for bean counters and "Bored" of Directors to look at and worry about. Instead, that initial cost is amortized into monthly rental costs (that of course include profit margins for the provider). Few people look beyond that initial "saving" to do the math and figure out the true long term cost. But that is the same decision you make any time you decide to outsource any part of your company. And in some places there can be tax benefits when you change from capital expenditures to ongoing "expenses" (monthly payments). One cost no one considers is the loss of corporate knowledge. If you "rent" someone else to do your work, you are paying them to learn the intimate details of your IT system. It is much easier for a "rented" person to move to a new job so your IT system ends up being supported by much less depth of knowledge than if you did it all in-house. Tony H: I think DR planning becomes more important simply because most people, especially Sr Execs!, ASSUME that the cloud is High Availability. The Amazon failure proved, most of the Amazon cloud continued to work. Amazon did do the initial DR planning, but their customers did not understand what they had to do when they purchased Amazon services to take full advantage of Amazon's DR plans. As a result, a few lost service. Your point about moving from one cloud provider to another is a good one. I too have lived through many migrations and ports. I do disagree slightly with your comparison. I've found that with enough "blood, sweat and tears", you can almost always get the data migrated. Where the relevant pain comes to switching cloud providers is translating application programs, functions and features in the various "old" and "new" software you use. That is often where you have to say, "we just can't do that anymore". Fundamental things like the difference in ASCII and EBCDIC character sort order can be painful when the user sees something slightly different. Every time you create custom programming to recreate old OS or application specific features you create long term hidden expenses. After a few years programmers don't remember, or ever learn, why it is done in that inefficient way. So they just follow the old example and perpetuate bad code. Taking your point about security one step further, for any company outside of the USA, the cloud exposes them to data theft by the US government. If the cloud company has any sort of ownership association with US companies, the "Patriot" Act allows the US government to get copies of the data, without notification to the data owner. Of course, that is just the most blatant example. That may sound paranoid, but when anything can be labeled a "terrorist act it's not that far out. And with the corporate ownership web so inter-related, it is very difficult to identify where ultimate ownership lies. Many years ago I saw a chart of all (?) the corporate holdings that the "Coka Cola" Corporation had. It was a huge spiderweb, many levels deep that spread into many unlikely seeming subsidiaries.

gechurch
gechurch

I just realised I replied to my own question three days later! What a dope!

Editor's Picks