As a consultant specializing in server-side IT tasks, especially in the enterprise space, I am often faced with large-scale jobs such as Exchange server migrations. Under normal circumstances, these kinds of tasks go quite smoothly, but once in a while, we run into the job from hell. This is the story of one such job and how client involvement kept the client happy and willing to work with us throughout the process.

Getting started
The job started out easily enough. We had to migrate an Exchange 5.5 system running on a single Windows NT 4.0 box to a cluster of two Windows NT 4.0 Enterprise machines. My team worked with the client to plan the hardware and software required for the job. Immediately, a problem surfaced, as our team’s lead engineer left the firm for a better opportunity shortly after the first day of work was completed.

The client began to feel uncomfortable with my firm, as the loss of the project lead is an extraordinarily unsettling experience. As for the rest of us, we tried to roll with the punches and show the client that she still had a great team behind the project. I assumed the role of project lead, and we continued on with our work. It was just about at this point that the enormity of their Exchange situation struck us square in the face.

Under normal circumstances, even enterprise Exchange migrations tend to go fairly smoothly. You join the first node of the new cluster to the Exchange Site and then give the public folders time to replicate. When that’s done, you go back and migrate the mail accounts after business hours (to avoid corruption) and then follow some well-written white papers describing how to remove the first Exchange server from the site.

A major problem arises
In this particular case, we discovered that the private data store was quite large, and we hadn’t counted on the sheer amount of time it would take to move over all the mailboxes from the single server into the cluster. As mentioned above, this portion of the job had to be performed after hours, which included taking into consideration that this client had offices in California (three hours behind the home office in New York). We began the migration on a Monday evening at 9:30 P.M., Eastern Standard Time.

By 1:00 A.M., we realized that because of the enormous level of white space and other space-waste, in addition to the original server being horribly overworked, the mailbox move would not finish that night. Since the server system was perfectly stable as a distributed Exchange site, we called it a night (or maybe a morning) and convened back at our offices the next day to replan the attack. The client was somewhat upset that we didn’t complete the job the night before, but we were able to reassure her that the delay was simply logistical, not technical.

We welcomed her to observe the entire procedure, and invited her to work with us, hands-on if she preferred. While she didn’t take us up on the offer at each step of the process, offering to involve the client eased fears and restored confidence in the team. Which was a good thing, because the mail migration that should have taken two nights continued for two weeks. Every night possible, team members spent most of the evening transferring mail accounts from the old single server to the new cluster. We also devised a method of remote control using RemotelyAnywhere through a RAS connection to continue the migration from home on weekends, while maintaining security and data integrity.

When we finally completed the mailbox migration, we moved the Exchange site completely onto the cluster and removed the stand-alone sever entirely. It was at this point that the team breathed a sigh of relief and began to think the worst was over. We were wrong.

Additional issues
The calls from the client started almost immediately. Functions of Outlook Web Access (not supported by Microsoft in a cluster configuration) were not working properly. E-mail attachments were not being transmitted. Backup solutions were not working correctly. In the end, to keep the client’s confidence with the team at its highest level, we had no choice but to return to the site without hesitation to fix the issue immediately each time a problem surfaced. Ironing out all the individual issues that came up took weeks. It seemed as though each time we corrected one problem, two more rose up to meet us. Anyone who has done server migrations is certainly familiar with this phenomenon, but it’s still frustrating when it happens to you.

Ultimately, we were able to provide a solution that met the client’s needs, within a fairly reasonable timeframe. Granted, it took a lot longer than planned (and is actually still being tweaked), but constant client interaction and involvement ensured that the job that would not die could at least be called a success.

Tell us your stories of upgrades gone awry

We look forward to getting your input and hearing about your experiences regarding this topic. Join the discussion below or send the editor an e-mail.