I’m not sure how long it was down before I became aware of the failure, probably somewhere between one and sixteen hours. I wasn’t the first one to notice there was a problem; one of my users approached me at about 9 AM to say that he was having trouble printing to one particular printer, and at the end of my troubleshooting trail I discovered that dreaded Blue Screen of Death on my Primary Network Server:
0×000000f … INACCESSIBLE_BOOT_DISK (or something like that, I don’t remember exactly.)
In addition to being my network’s primary server, I also had a couple of Jet Direct printers routed through it. But why didn’t I hear multiple user complaints about the network being down, files being inaccessible, email not working, etc.? Because my backup server was on-line and ready to do its job - which is to take over seamlessly in case of a failure - and it actually worked. Wow! I dodged that bullet!
No problem, I thought. I’ll just go through the recovery process and restore everything back to its original working condition. No such luck - Windows detected numerous unrecoverable problems, said the new error message. Okay, I thought, I do have that second drive in a mirrored raid configuration, and perhaps I can remove the failed drive and rely on that second one. Again, no go. In the end, I had to reinstall the server operating system on a brand new drive.
I’m still not sure why it all failed, except that it was due to system file corruption - the source of which remains unknown. And the drive I had installed in a mirrored raid configuration, of course, suffered the same ill fate.
This was certainly a lesson in proving the importance of having a backup server in place, even in a small network environment. The potential user down-time could have been very costly. But while it showed me what I did right, it also showed me what I did wrong.
What I did right: I had a backup server in place, fully replicating the domain configuration ready to seamlessly assume control. Something else I did right? I had a backup expert available to come in and help me configure the server replication and assignment/reassignment tasks, something that he did in a couple of hours, but it might have taken me a day or more to figure out (since that’s something I might do once every six or seven years!).
What I did wrong: I didn’t have an image of the server’s hard drive ghosted on a backup location. If I did, I could have easily restored everything in probably an hours time. Why didn’t I? Well, I plead the fifth on that one. But you can bet your sweet bippie that I do now! (Due credit given to Dick Martin of Rowan & Martin’s Laugh-In for my use of the term “bippie”.)