The best laid plans
The primary site was due to undergo an eight-hour power shutdown for electrical maintenance that weekend. In anticipation of this my client opted to test their ability to "flip" email processing to their DR site so email services would continue to run during the outage. This process was called a "datacenter switchover" and they had tested it before. It involved taking the primary Exchange mailbox servers out of the DAG then activating the mailbox and Client Access Server (CAS) in the DR site.
The switchover was expected to take 30 minutes of work and users had been informed to expect a 3-5 minute window of email downtime. The system administrator, who I'll call Rick, started the process during the lunch hour in order to have the least impact upon users. While this type of work seems better suited for late at night, this company's policy was to perform these activities during the day while staff is present so any issues can be addressed as quickly as possible. As it turned out, issues appeared immediately.
The "Uh-Oh" moment
Rick kicked off the switchover by running an Exchange PowerShell command to take his primary Exchange servers out of the DAG. He then ran a command to activate the Exchange mailbox server in his DR site - and immediately hit a brick wall. Exchange was happy enough to pull out the primary Exchange servers but refused to bring up the DR Exchange server. Errors indicated the mailbox databases couldn't be mounted on the DR server since they were supposedly already mounted. After trying a few more commands Rick found his DAG completely stuck in the mud. He couldn't bring up the Exchange databases in either site nor re-add his primary Exchange servers to the DAG since he received errors that they were already part of a cluster. Email was completely dead in the water and almost an hour was rapidly eaten up in frantic event log reviews and Google searches.
Rick did the smart thing and contacted Microsoft support. They resolved the issue - sort of - by forcibly mounting the Exchange databases in the DR site and using the Failover Cluster Manager application to get the primary site Exchange servers back into the DAG. After three hours of downtime that got email up and running - or did it?
I followed up with Rick later to ask how things went. He was understandably frazzled and not a little cynical about the experience.
"We found out we couldn't run email in our DR site that night," he told me. "We tested this by powering down the Exchange servers in the primary site, and nobody could open Outlook - even though our mailbox and CAS systems were supposedly working fine in the DR site. That problem lasted until we powered up the primary site servers again."
"So, what happened next?" I asked.
"Well, we still had the scheduled power shutdown to deal with, and we HAD to make sure email worked in the DR site or the company would suffer. We brought in dedicated Exchange 2010 consultants to try to figure out what went wrong with the DAG. It seemed to be okay but we didn't find any 'smoking gun' that explained what why the initial switchover bombed - even after combing through endless event logs and reviewing the Exchange best practices analyzer. Our best guess is there was a synchronization problem among the mailbox databases which confused the DAG, but the databases also seemed okay again. So, we were down to two choices: try that same datacenter switchover process over again or come up with something new."
I could tell by his tone that they had gone with the latter option. "Why were you reluctant to follow the same process?" I asked.
"Because we had no way of knowing it would work," Rick said. "I already took enough heat for the three hours of downtime - without being sure that the problem was fixed I flat-out refused to go down that road again. We did find out why Exchange failed when our primary site servers went down. We have three servers in our primary site and two in our DR site. The DAG uses a concept called 'quorum' whereby each server has a vote as to where services are live if something bad happens - and the majority of votes in either site makes the call. Since we shut down the site with the majority of votes the DAG was DOA."
"Sounds needlessly complex," I observed.
"It gets more so. Instead of doing the datacenter switchover our consultants recommended we install a dummy Exchange mailbox server in our DR site so we could successfully activate our mailbox databases there."
"On the dummy mailbox server?" I asked.
"No, on the actual production mailbox server, but we needed the additional mailbox system so Exchange could establish a 'quorum' in the DR site of three servers to allow email to work. I was skeptical - extremely skeptical - that this would succeed, since it seems to me after 10+ years of supporting Exchange that it waits for any available opportunity to break down and stab you in the back."
Let's analyze that for a second. Rick had gotten so disenchanted with Exchange that he doubted whether documented procedures and the Exchange software itself would work as expected. If a system administrator can't rely on his own systems I'd say that represents a significant crisis of faith.
"And did that work?" I asked.
"Happily, yes. We got through the power outage after activating our mailbox databases in the DR site. As far as I'm concerned, though, it's time to look at other solutions. Weirdly enough, just the week before I rejected talk about moving email up to a hosting service since I wanted to keep it in-house. Then this happened."
"Seems like it all worked out in the end," I said.
"Well, after three hours of email downtime and having to pay outside consultants, sure," Rick replied. "However, look how much we lost. People were sitting around unable to read or reply to urgent customer messages. My group suffered a reputation blow since we'd announced only five minutes of downtime. Furthermore, the irony here is that we built an expensive fault tolerant Exchange environment that blew up the second time we tried to test a DR scenario!"
"I guess you have to be a full-time Exchange guy to manage this stuff effectively," I noted.
"That's what it boils down to. Oh, I probably could have found there was a problem before that datacenter switchover if I'd been more careful - I admit that. But I've got a bunch of other stuff going on - Citrix, monitoring, security, you name it. I don't have time to babysit Exchange. I could have been working on rolling out a new app virtualization project, but instead put thirty hours into this - not to mention all the 'Outlook is slow', 'I can't connect to email from outside the company,' 'I lost my PST file' stuff that has taken so much of my days. I've put up with about 48 hours of email downtime in my career - believe me, there's nothing less fun. I'm thinking now it's better to just move this out of the data center and have done with it. As far as I'm concerned there's no future in Exchange."
"I suppose there is for hosting providers," I mentioned.
"I mean for my career. I used to be leery of anything that seemed like a threat to IT staff. They came out with outsourcing and we all thought we'd lose our jobs. They developed virtualization and we all thought that would reduce IT headcount. Now there are hosted email systems and if anything now I think it would free me up to do the more meaningful things I just discussed. There is always going to be work for an IT pro; the question is whether it's worthwhile or not. Sure, there will still be some measure of downtime no matter where your data and services are - that goes with the territory - but the next time it happens I don't want to be the guy in the trenches when the dedicated experts ought to be there instead. My sanity bank account is overdrawn."
Rick also related other concerns about the reliability of Active Directory and Windows Server 2008, and expressed the doubt that Microsoft has a clear understanding as to how to produce forward-thinking software of genuine value.
Current statistics and future trends
I did some digging after talking to Rick and found out some interesting statistics which vindicate his perspective. Technology research organization Gartner predicts by the end of 2014 that "at least 10 percent of enterprise email will be based on a cloud or software-as-a-service model. This is continued to rise to at least 33 percent by the end of 2017."
According to whitepaper from Rackspace.com titled "The Case for Hosted Exchange," the following (Figure A) represents the Monthly TCO (Total Cost of Ownership) figures for on-premise versus hosted exchange:
Google Apps for Business pricing is even simpler. (Figure B)
Personally I prefer Google Apps hosted email solutions over Microsoft's, not because I write for the "Google in the Enterprise" blog but because I agree with some of Rick's concerns about the ongoing relevance of Microsoft and I find the Google pricing scheme more attractive.
In both cases there is no server hardware, in-house software, data center cost or backup expense. Users access their data from any location using an array of devices. A scheduled power outage would have meant little to nothing if Rick's company had hosted email in place.
Making the call
Rick's change in perspective was probably based both on emotion and logic, but I think he operated from a rational basis in both categories. Email maintenance is slowly being perceived as work which is more "custodial" and less "innovative." IT professionals are engaged in an ongoing evolution of bringing value to the business. When it comes down to it, what improves a structure more - mopping the floor or building an addition?
Hosted email isn't a magic solution completely free of drawbacks. There are still significant concepts such as data migration, access configuration, user training, security, compliance and SLAs to contend with. A careful analysis of the pros and cons must be measured by all decision makers whether in IT, Finance, HR, or other relevant departments. When properly planned and executed, however, it's not just the data and services which get shifted out of the organization, but the headaches and distractions as well - and perhaps even the rekindling of a system administrator's creative love for technology.
In this case the facts - and not marketing propaganda or website advertisements - convinced Rick of the reality of the situation at hand and the next step in his approach to the future. It will be interesting to see how it pans out for him as well as organizations elsewhere.
Scott Matteson is a senior systems administrator and freelance technical writer who also performs consulting work for small organizations. He resides in the Greater Boston area with his wife and three children.