A new perspective on hosted email services: One system administrator's shift in outlook

The concept of cloud services can seem threatening to in-house IT staff. Learn how one system administrator weighed the facts and altered his views.

I recently visited a client site to perform some virtualization consulting work. My client relies on in-house Exchange 2010 servers which are configured in a (supposedly) fault-tolerant database availability group (DAG) across their two sites: the primary and the secondary disaster recovery (DR) location. During my stay I got a front row seat to some serious issues which occurred during a scheduled Exchange switchover.

The best laid plans

The primary site was due to undergo an eight-hour power shutdown for electrical maintenance that weekend. In anticipation of this my client opted to test their ability to "flip" email processing to their DR site so email services would continue to run during the outage. This process was called a "datacenter switchover" and they had tested it before. It involved taking the primary Exchange mailbox servers out of the DAG then activating the mailbox and Client Access Server (CAS) in the DR site.

The switchover was expected to take 30 minutes of work and users had been informed to expect a 3-5 minute window of email downtime. The system administrator, who I'll call Rick, started the process during the lunch hour in order to have the least impact upon users. While this type of work seems better suited for late at night, this company's policy was to perform these activities during the day while staff is present so any issues can be addressed as quickly as possible. As it turned out, issues appeared immediately.

The "Uh-Oh" moment

Rick kicked off the switchover by running an Exchange PowerShell command to take his primary Exchange servers out of the DAG. He then ran a command to activate the Exchange mailbox server in his DR site - and immediately hit a brick wall. Exchange was happy enough to pull out the primary Exchange servers but refused to bring up the DR Exchange server. Errors indicated the mailbox databases couldn't be mounted on the DR server since they were supposedly already mounted. After trying a few more commands Rick found his DAG completely stuck in the mud. He couldn't bring up the Exchange databases in either site nor re-add his primary Exchange servers to the DAG since he received errors that they were already part of a cluster. Email was completely dead in the water and almost an hour was rapidly eaten up in frantic event log reviews and Google searches.

Rick did the smart thing and contacted Microsoft support. They resolved the issue - sort of - by forcibly mounting the Exchange databases in the DR site and using the Failover Cluster Manager application to get the primary site Exchange servers back into the DAG. After three hours of downtime that got email up and running - or did it?

The post-mortem

I followed up with Rick later to ask how things went. He was understandably frazzled and not a little cynical about the experience.

"We found out we couldn't run email in our DR site that night," he told me. "We tested this by powering down the Exchange servers in the primary site, and nobody could open Outlook - even though our mailbox and CAS systems were supposedly working fine in the DR site. That problem lasted until we powered up the primary site servers again."

"So, what happened next?" I asked.

"Well, we still had the scheduled power shutdown to deal with, and we HAD to make sure email worked in the DR site or the company would suffer. We brought in dedicated Exchange 2010 consultants to try to figure out what went wrong with the DAG. It seemed to be okay but we didn't find any 'smoking gun' that explained what why the initial switchover bombed - even after combing through endless event logs and reviewing the Exchange best practices analyzer. Our best guess is there was a synchronization problem among the mailbox databases which confused the DAG, but the databases also seemed okay again. So, we were down to two choices: try that same datacenter switchover process over again or come up with something new."

I could tell by his tone that they had gone with the latter option. "Why were you reluctant to follow the same process?" I asked.

"Because we had no way of knowing it would work," Rick said. "I already took enough heat for the three hours of downtime - without being sure that the problem was fixed I flat-out refused to go down that road again. We did find out why Exchange failed when our primary site servers went down. We have three servers in our primary site and two in our DR site. The DAG uses a concept called 'quorum' whereby each server has a vote as to where services are live if something bad happens - and the majority of votes in either site makes the call. Since we shut down the site with the majority of votes the DAG was DOA."

"Sounds needlessly complex," I observed.

"It gets more so. Instead of doing the datacenter switchover our consultants recommended we install a dummy Exchange mailbox server in our DR site so we could successfully activate our mailbox databases there."

"On the dummy mailbox server?" I asked.

"No, on the actual production mailbox server, but we needed the additional mailbox system so Exchange could establish a 'quorum' in the DR site of three servers to allow email to work. I was skeptical - extremely skeptical - that this would succeed, since it seems to me after 10+ years of supporting Exchange that it waits for any available opportunity to break down and stab you in the back."

Let's analyze that for a second. Rick had gotten so disenchanted with Exchange that he doubted whether documented procedures and the Exchange software itself would work as expected. If a system administrator can't rely on his own systems I'd say that represents a significant crisis of faith.

"And did that work?" I asked.

"Happily, yes. We got through the power outage after activating our mailbox databases in the DR site. As far as I'm concerned, though, it's time to look at other solutions. Weirdly enough, just the week before I rejected talk about moving email up to a hosting service since I wanted to keep it in-house. Then this happened."

"Seems like it all worked out in the end," I said.

"Well, after three hours of email downtime and having to pay outside consultants, sure," Rick replied. "However, look how much we lost. People were sitting around unable to read or reply to urgent customer messages. My group suffered a reputation blow since we'd announced only five minutes of downtime. Furthermore, the irony here is that we built an expensive fault tolerant Exchange environment that blew up the second time we tried to test a DR scenario!"

"I guess you have to be a full-time Exchange guy to manage this stuff effectively," I noted.

"That's what it boils down to. Oh, I probably could have found there was a problem before that datacenter switchover if I'd been more careful - I admit that. But I've got a bunch of other stuff going on - Citrix, monitoring, security, you name it. I don't have time to babysit Exchange. I could have been working on rolling out a new app virtualization project, but instead put thirty hours into this - not to mention all the 'Outlook is slow', 'I can't connect to email from outside the company,' 'I lost my PST file' stuff that has taken so much of my days. I've put up with about 48 hours of email downtime in my career - believe me, there's nothing less fun. I'm thinking now it's better to just move this out of the data center and have done with it. As far as I'm concerned there's no future in Exchange."

"I suppose there is for hosting providers," I mentioned.

"I mean for my career. I used to be leery of anything that seemed like a threat to IT staff. They came out with outsourcing and we all thought we'd lose our jobs. They developed virtualization and we all thought that would reduce IT headcount. Now there are hosted email systems and if anything now I think it would free me up to do the more meaningful things I just discussed. There is always going to be work for an IT pro; the question is whether it's worthwhile or not. Sure, there will still be some measure of downtime no matter where your data and services are - that goes with the territory - but the next time it happens I don't want to be the guy in the trenches when the dedicated experts ought to be there instead. My sanity bank account is overdrawn."

Rick also related other concerns about the reliability of Active Directory and Windows Server 2008, and expressed the doubt that Microsoft has a clear understanding as to how to produce forward-thinking software of genuine value.

Current statistics and future trends

I did some digging after talking to Rick and found out some interesting statistics which vindicate his perspective. Technology research organization Gartner predicts by the end of 2014 that "at least 10 percent of enterprise email will be based on a cloud or software-as-a-service model. This is continued to rise to at least 33 percent by the end of 2017."

According to whitepaper from titled "The Case for Hosted Exchange," the following (Figure A) represents the Monthly TCO (Total Cost of Ownership) figures for on-premise versus hosted exchange:

Figure A


Google Apps for Business pricing is even simpler. (Figure B)

Figure B


Personally I prefer Google Apps hosted email solutions over Microsoft's, not because I write for the "Google in the Enterprise" blog but because I agree with some of Rick's concerns about the ongoing relevance of Microsoft and I find the Google pricing scheme more attractive.

In both cases there is no server hardware, in-house software, data center cost or backup expense. Users access their data from any location using an array of devices. A scheduled power outage would have meant little to nothing if Rick's company had hosted email in place.

Making the call

Rick's change in perspective was probably based both on emotion and logic, but I think he operated from a rational basis in both categories. Email maintenance is slowly being perceived as work which is more "custodial" and less "innovative." IT professionals are engaged in an ongoing evolution of bringing value to the business. When it comes down to it, what improves a structure more - mopping the floor or building an addition?

Hosted email isn't a magic solution completely free of drawbacks. There are still significant concepts such as data migration, access configuration, user training, security, compliance and SLAs to contend with. A careful analysis of the pros and cons must be measured by all decision makers whether in IT, Finance, HR, or other relevant departments. When properly planned and executed, however, it's not just the data and services which get shifted out of the organization, but the headaches and distractions as well - and perhaps even the rekindling of a system administrator's creative love for technology.

In this case the facts - and not marketing propaganda or website advertisements - convinced Rick of the reality of the situation at hand and the next step in his approach to the future. It will be interesting to see how it pans out for him as well as organizations elsewhere.


Scott Matteson is a senior systems administrator and freelance technical writer who also performs consulting work for small organizations. He resides in the Greater Boston area with his wife and three children.


Having worked with DAG design and failover for a couple of years now, it sounds like "Rick" didn't design it properly and/or didn't have all of the necessary resources available (AD, DNS, Exchange Replication service, Cluster service, etc).  I say that because I have had to "Break" the DAG functionality and recover from it at my current employer for our annual DR test (successfully, I might add).
And I would have fired the consultants that suggested spinning-up a new Exchange server.  If the AD sites were designed properly, and if "Rick" had his alternate fileshare witness in place, things would have been golden. 

Additionally, the complaint that 'I can't connect to email from outside the company' is invalid as he can enable "Outlook Anywhere" in Exchange and/or allow his users to use OWA (seriously, that's why they're there...unless the complaint is in regards to the outage; the statement is tough to follow in that regard) and the gripe about "Quorum" is going to be universal to any node in a Failover Cluster (Exchange or not). 

I think it's kind of messed-up to take the ignorance and failures of one sysadmin and turn it into a mini sales-pitch for Google Apps. 



1.  Bad decision to schedule maintenance during the workday.  "Rick" got exactly what he deserved on that one.

2.  Critical Success Factors require experts to manage them; either they live in-house, or your outsource them.  Rick and Company weren't the experts on Exchange, which is why they had to go outside for consults.  Since his company isn't interested in employing an expert, the only solution is to go outside for a host.

3.  Unless you've done at least 3 successful complete restore operations on a system as currently configured, you do not have a working backup and restore capability.  And you're lying to yourself if you think you do,


Systems go down.  Its a fact.  "Rick" was really stressed for no reason.  His company made a HUGE mistake by doing their failover during the business day.  The potential impact outweighed the minimal risk.  Basic ITIL principle of Change Control.  

The DAG environment seems like it changed since it was originally configured.  Not sure if they used a consultant to do the install or not.  Regardless, while Rick wasn't an exclusive Exchange Admin, he managed to keep the environment running to the point where they had only tested their DAG once!  Symantec/Messagelabs has an email continuity cloud product that eliminates the need for DAG and is less complex. Worked great for me during my Exchange admin days. 

A lot of SMBs host their email with local providers and as storage costs rise, Enterprises are starting to question the value of the complexity of the system.  Services like Yammer are trying to eliminate email altogether!  However in a lot of environments it is just a shift in costs as shared tenant email providers have some of the same risks as an internal organization. The question is can the business afford the monthly OpEx cost of a hosted service in a bad year?  If I own my infrastructure I can ensure it works with dependent applications, plugins, etc.  A shared tenant host like Exchange or Gmail may not have that capability.



 I have spent the last 3 months with Microsoft O365 (Exchange online) problems.....we have had over 15 hours of accumulated downtime during business hours in the last 8 weeks. That's around 95% uptime....Our company relies on email for day to day operations. The worst is the IT department being held accountable for a service that is at best only marginally acceptable.

 I have yet to see a cloud service have anywhere near the performance or reliability of an in house solution. Cloud based phones systems or Cloud email (Microsoft or Intermedia or anyone else for that matter) is all pie in the sky. When you use an inherently unstable connection (i.e the Internet) to be connected you only open yourself up for problems. TCO figures put forth by "cloud" "hypesters" do not take into account the soft costs nor the cost to the business when the service is degraded or down. Its the old bait and switch.

Its been around for along in politics when you take the same idea and re-brand goes the cloud....Tymshare, ASP providers and SaaS have all been was the brilliant move to call it "The Cloud" (which your average executive could understand ???). Just SSDN (Same **Stuff** different name).  Talk about TCO and all the money you will save...

Yep I am jaded. I have seen 3 cloud transitions fail due to false promises, poor performance and utter lies about the cost from the vendors. Whats funny is when you call a "cloud" provider up for a ticket and all they can tell you is that either their "Cloud Provider" is down or the "Cloud" itself is down. 

Can you touch a Cloud? Hold it? Sit on it? How do you hold a "Cloud" accountable...What happens when the cloud is hacked and data is mined by Chinese or Russian mafia? We all ready have cloud based services for DMV's across the US (which have caused lines to increase for licenses and auto registration). How long before we have all of our PI (personal information) in the "Cloud"...Who is accountable...Who will take the hit for all this.....surely not the CIO or CEO that made the financial decision to move to the "Cloud"

The IT manager who has no control over the service at all will be held accountable.



Couldn't the moral of this story just as easily have been, "Why NOT to use Exchange as your in-house Email system"? This is the kind of 'pass the buck' mentality that politicians rely on. Even though the problem isn't 'theirs' anymore, it will still exist, and SLAs in my experience are, "Oh, did we not meet that SLA? That's too bad. There's another service down the road. Why don't you try them? We have lots of other customers waiting for our time..." The owner of our company wants someone he can IMMEDIATELY hold accountable in the first-person, which is me. I consider my trouble-shooting skills to be fair enough that, especially with the aid of an outside consultant, we can minimize unexpected downtime. I've seen the relative skills of these consultants, and compare it with the skill of those that work for these larger companies, and I don't consider them a magic 'fix-all' solution. You're just looking at a bunch of people that, when the chips are down, have the option of taking a step back, shaking their heads, and saying, "That's too bad. We'll fix it sometime..."

In-house staff don't really have that option if they want a paycheque.

Another perspective... a properly trained and certified Exchange administrator would have been able to handle this issue with one hand tied behind their back, in the dark, using a flashlight, on an empty stomach.


I once tried to test my SBS2003 backup to see if it would actually run. I restored to a Virtual Machine but because it wasn't activated on that hardware with MS I couldn't get past the login screen. That's when I realised what a joke it was and migrated everyone to Google apps. We all sleep better now.


Regardless of who you use as a provider, you can use the open source application to encrypt your messages.


@AstroCreep It's always easy to comment on a situation we weren't involved in to play Monday morning quarterback.  With that in mind, I'd certainly like to read your thoughts on what you would have done in Rick's scenario; how you would have troubleshot the DAG problems to establish the root cause as well as what third method (besides a DAG failover/adding another mailbox server) you would have come up with as a remedy for the situation to ensure a successful site failover in lieu of the first two options.  I'll alert Rick to your feedback so he can evaluate whether your notions have any validity in his environment.  Please be as detailed as possible.

As for your allegation that my article is a sales pitch for Google, I think you missed the point that in-house email services can be too cumbersome and time-consuming to manage when there are other initiatives afoot, and that the ongoing trend now is for companies to move towards hosted email services to relieve the burden.  This makes the sharpening of Exchange skills less valuable to the future careers of IT pros than, say, virtualization or security.  

For the record, my job is not to serve as a paid shill for Google products, but instead to review and discuss them in an enterprise context to help readers decide their value.  I've written articles about Google which were both positive and negative.  For example, I stated in "Steering around the potholes with Google Drive" ( that I do not find Google Drive synchronization reliable, an opinion I still maintain to present day such that I would recommend against business adoption of this storage service at least until the issues I have seen with it are ironed out.

Looking forward to your Exchange recommendations for Rick - thanks!





I work for a company of about 25 people. We used to use a traditional outsourced IT services provider. That was then replaced by an in-house guy. Throughout, we used Exchange Server and Outlook for email.

More recently, we changed company structure, lost the in-house IT professional, and it fell to me, as the most IT literate person in the company, to make some decisions about IT provision. Whilst I consider myself to be reasonably IT literate, I am not an IT professional. I work in finance. After considering the options, (in house Exchange, virtualised Exchange, Hosted Exchange, MS365 and Google Apps) we migrated to using Google Apps for Business. So far, I feel the decision has been completely vindicated.

You talk about unstable connections being a problem. Without an internet connection, we couldn't do any work at all, let alone email, so I took a stable connection as being axiomatic to the decision making process.

I think the strength of Google's solution is that it completely removes the need for any kind of professional IT input at all. We did have a few problems in getting things set up, but Google themselves were very helpful and we got it all sorted. We have used a couple of third party companies as part of our solution too through the apps market.

If we were a larger company with a full time IT professional on the staff, I sure wouldn't want him spending his time debugging Exchange issues. All that discussion above, between smmatteson, Rick and Astrocreep, about whether this approach or that is better, and about whether things were set up correctly or not - well, its all academic (not to mention being completely incomprehensible) to me. With Google, we have had zero downtime since we made the switch, about 3 months ago. This compares to regular downtime and difficulties using Exchange over the preceding 10 year existence of our company.

Our ongoing cost has been between half and a fifth of the quotes I received for the alternative services

Most users here have continued to use Outlook. Those that don't have been happy to move on.

Your point about rebranding of the client-server model is a valid one. However, whilst you sound like you see this as a bad thing, I see it from the end-user perspective, and I see it as a good thing. Frankly, I couldn't care less what technologies Google use to ensure the service works. The fact is, it does, and using a company with the infrastructure scale of Google, means it works well, and reliably.

In relation to your final paragraph, well, this is where you start to come across as a bit nutty. We have a contract with Google. In the highly unlikely even that they breach our client data confidentiality, we will sue them, using the contract. It is no different to any other service provider agreement. So I'm not worried!



@klashbrook True, but he would be totally worthless on the 80 other things an IT Shop needs to keep up and running. 


@smmatteson I get what you were saying about in-house e-mail systems being cumbersome, but the same can be said for AD, vSphere, SQL, etc.  If you don't know the systems very well, any "Issue" can be a big problem. 
Additionally the point about how your personal preference for gApps simply didn't sit with me as merely an in-passing comment since your gig here on TR is for gApps. 

As for my thoughts on Rick's experience I would be curious as to the actual errors he received and if he has these two physical sites set up as separate sites in AD or one big site.  I can't really make any other recommendations until I know about them, but that would get me a better understanding of the issue. 

Lastly, anyone maintaining an Exchange 2010 DAG should look up Tim McMichael's blog (he's a PFE for Microsoft and does a lot with the inner-workings of DAG):  He's done nine DAG-specific blogs thus far.  
He also published a "Data Center Switchover Tool" that has been released by Microsoft ( that is quite helpful in switchover situations.  It's nothing more than a .PPT that gives you the actions to preform based on your situation, but it is helpful and it includes links for common issues in the failover, restore, failback process. 


@smmatteson Sure thing.  Glad it was helpful.  :) 

That error (0x46) is indeed the one I get when doing a switchover and requires me to run the command a second time.  

The more I think about it, it may be that mbx2 was the Primary Active Manager at the time and is why this "failed".  If that is the case, I cannot foresee a situation where you would only need to run the Restore-DAG command once. 


@AstroCreep posting this for Rick - I very much appreciate the research and feedback: 

"Yes, each site has 2 MB servers; I shut down the Exchange services then ran the "Stop-DatabaseAvailabilityGroup" command against each one.  I like your idea about referencing this by site better however.  

The sites were still connected with working links, DCs, etc. so removing that "-ConfigurationOnly" switch sounds like a good idea.  I'll have to split my datacenter failover guide into 2 parts: "if the sites are connected and the primary servers are OK" and "if this is a true exchange outage; the primary site or servers are defunct."  Clearly we were proceeding in the first instance as if it were really the second instance and the approaches should be different.

I do recall running "Start-DatabaseAvailabilityGroup –Identity 'dag-01' -ActiveDirectorySite ‘secondary site name’" during this ordeal.  However, your link helped point out a log file under C:\ExchangeSetupLogs which led me to an error to help pinpoint the problem:

"EvictDagClusterNode got exception Microsoft.Exchange.Cluster.Replay.AmClusterApiException: An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"EvictClusterNodeEx('') failed with 0x46. Error: The remote server has been paused or is in the process of being started"' failed.. ---> System.ComponentModel.Win32Exception: The remote server has been paused or is in the process of being started

(obviously I have substituted '' for the actual server FQDN for confidentiality purposes :)

Without going into excruciating analysis, it seems evident there was indeed a cluster problem - maybe tied in with Exchange functions, replication or something else that boils down to "the servers weren't communicating in a healthy fashion."  This gives me something to work from in a lab experiment.  I hate not identifying root causes, but just need some quality time to sit down and inspect the process step by step.  

We've added many prerequisites to a datacenter failover like verifying mailbox database replication, DAG connectivity, etc. so will also include cluster steps to determine all is well.  Thank you for all your assistance!  Microsoft support was good at getting us up and running but then beat it like the Lone Ranger since the scope of the call ("working Exchange") was fulfilled."


@smmattesonSure thing.  :) 
Okay, here's what I have: 

1. Are there more than one MB server in each site?  If so, then this command would be okay, but you would need to run that command against each server you want to stop participating in the DAG, which is where the beauty of using the AD site parameter instead of each individual server lies.  If the command were more like this: 

Stop-DatabaseAvailabilityGroup -Identity 'DAG-01' -ActiveDirectorySite 'Site-01' -ConfigurationOnly:$true

It would have worked out better as it would have stopped all servers in the primary site with one command.  Additionally, if the sites were still connected when this went down, there would have been no need for the -ConfigurationOnly switch. That's only to be used when the DCs cannot communicate cross-site. 

2. This is correct.  This step is necessary to force the servers at the secondary site to take over the cluster and (if necessary) force quorum. 

3. This is correct; however, this is a common error and can be resolved by re-running the command.  This is actually written as a specific line-item in my DR-exercise as it happened to me every time I tested it. 
Look here for additional insight:

So, I guess it's the "When an error isn't really an error" situation.  :/   From the sounds of things though, this appears to be a potential issue with timing of the actual Cluster Service and not an issue with Exchange (splitting hairs, maybe). 
And yes, I get that it is frustrating and disheartening.  Even more disheartening and frustrating is that neither the consultant nor MS support couldn't get it going.  

...unless, of course, this was already attempted and it was still a problem.  If that were the event, I have found in a lab that if I am having issues getting the Exchange servers to function after the Restore command has been run it is due to the Replication service having been stopped at some point after the connection was severed.  The remedy for me was to run the following command: 

Start-DatabaseAvailabilityGroup –Identity 'dag-01' -ActiveDirectorySite ‘Site-02’

I hope these have been helpful.  If not, ignore my defense of Exchange.  ;) 


@AstroCreep Thanks for the links - I have passed those along.  Rick says:

"Thanks for any assistance.  I think I might have a lead on this.  To fill in, the 2 physical sites are in 2 AD sites; "primary" and "secondary."

Our Exchange environment was set up by the same consultants who helped us out on this problem.  They told us how to do the datacenter failover which starts with these 3 commands::

1.  Run Stop-DatabaseAvailabilityGroup -Identity DAG-01 -MailboxServer (primaryserver1) -ConfigurationOnly

(this was run for each of the primary site mailbox servers, so we could then bring up the DAG on the secondary site mailbox server)

2.  net stop clussvc 

(this was run on the secondary site mailbox server to stop the Cluster Service)

3.  Restore-DatabaseAvailabilityGroup -Identity DAG-01 -ActiveDirectorySite "secondary" -AlternateWitnessServer "secondary site CAS" -AlternateWitnessDirectory "path to witness directory on secondary site CAS"

This is where we hit a brick wall.  The error given was along the lines of "WARNING: Server ("primaryserver1") was marked as stopped in database availability group ‘DAG01' but couldn’t be removed from the cluster."  The DAG was then DOA.  Microsoft support had to force-mount databases on their secondary site server and use the Failover Cluster Manager MMC tool on the primary servers to take them out of the cluster and add them back in.  They couldn't really tell us why the problem happened.

I think the issue may be that we didn't need step #2 as that may be obsolete in Exchange 2010 SP1.  I have a test environment I'm going to try that out in."

Editor's Picks