Data Centers optimize

10 holiday-busting IT crises

We asked TR members to share their memories of holidays that went awry because of a work emergency. Here are 10 such tales, plus one vacation horror story.

Photo: Copyright © iStockphoto/sdominick

In my last Call for Feedback, I solicited stories about vacations and holidays that were ruined by IT emergencies. Here are 10 tales from TechRepublic readers that reflect a work culture that can be a bit...  overbearing. In addition, there is one truly tragic story that serves as a reminder that sometimes, we really don't have it so bad, no matter what calls we get.

1. Omaha0428: Holiday and IT

"Being in accounting software support, I've spent many New Year's Eve and New Year's Day holidays working to support our accounting users who just can't wait for business to resume on the 2nd -- and seem to have more problems right then than any other holiday or vacation.

Recently, some of my co-workers had a vacation issue though -- all three support people were on vacation simultaneously (just asking for Murphy's Law to kick in), and they each were on the phone on a conference call until 2 AM from their respective vacation sites helping fix a critical problem. That is probably the worst one I've encountered (and I consider myself lucky!)"

My take: In IT, we often have to work when our users do. In the accounting world, year-end is an absolutely critical time, so the holidays tend to get lost. In the education world-- where most of my background lies -- critical times are the beginning and end of each semester. I will admit that I probably would not have allowed all my support people to be away from the office at the same time, but it sounds like Omaha0428 and his peers were able to handle the issue that arose without major impact on the time away.

2. mhodgdon: Camping vacation

"I took a vacation to Maine to do some camping and after being up there for only one day received a phone call from work that our T1 circuit had gone down (which handles the phones). So I made a quick phone call to our carrier (they knew us well, as we had been having this problem regularly). The next day of my vacation, I got another call from work that no one could get their documents located on the server and that they kept getting  an"IP address already in use" message from Windows. Luckily, this was on a Friday, and I did all I could remotely to get key people connected to their files. Come to find out that someone took it upon himself to rearrange a room that contained a poorly placed network switch (that part is my fault) and had created a loopback in the system. Needless to say, I quickly thereafter ordered new switches with loopback detection (we're nonprofit, so it's hard to get new equipment)."

My take: I've been here. Sometimes, we get in a hurry and can't always clean up and do everything exactly how we'd like. I've suffered from this. And I've had well-meaning people mess around with things that they really shouldn't.

This is also a lesson for making sure that organizations don't underspend! It's easy to say in hindsight, but it's really important to make sure you're able to buy equipment that has features you'll need. Nothing is worse than unavoidable downtime. Mhodgdon certainly did the right thing here --- identified the root cause of the problem and corrected it so it would never happen again.

3. Kevin: No holiday for me, apparently

"About seven years ago, someone took it upon themselves to decide that the business day calendar for our scheduling system really meant Monday - Friday (including holidays). So about three minutes past midnight on New Year's Day, I got a frantic call from the data center telling me that dozens of jobs had just failed simultaneously.

That's right, the entire batch cycle started running on a non-business day. Honestly, it didn't do any damage, but I spent about four hours making sure things wouldn't get any worse. And then I called the guy who changed the calendar at 4 AM to let him know what he did.

He sounded like he'd had a fun night until then."

My take: Ouch. I guess this is a good argument for change management?

4. technical.angel: A three-hour tou... um... upgrade

"After fighting tooth and nail, I was able to get the OK to upgrade our UC voicemail system, so it would be compatible with Exchange 2010.

Since the entire university was closed from Christmas Eve until after New Year's, we decided that the 27th would be the best day to do the upgrade that the tech was saying would take about three hours. 10 hours later, the Server Admin and I (telco admin) told the tech to go home, as he had about a two-hour drive ahead of him.

I ended up putting in about 35 hours during that week. Two months later, during a conference call between the tech, our server admin, a level 3 or 4 tech at the man. of the voicemail system, and a tech at Microsoft, they found the tiny little setting on the Exchange server that was causing all the problems. Of course, there was nothing documented anywhere about this setting, and its penchant of keeping the voicemail system from communicating with it, until the entire voicemail system locks up and crashes.

Great fun."

My take: I love Exchange. I really do. But Microsoft does sometimes have a tendency to bury critical settings and not always document them as well as it could. Recently, I was working with someone on upgrading Small Business Server 2003 to Small Business Server 2011. We got an incredibly generic message during the upgrade process that the Exchange installation failed. Given that SBS does a lot of things behind the scenes and there were no error messages other than the failure message, it was time to engage Microsoft product support. Ten hours later, they were able to get the Exchange migration done. Again, it was a case of not really knowing what SBS had done behind the scenes due to lack of documentation.

5. Sparky: Merry Christmas

"Christmas day 2009 around 1:00 PM, I got a call saying the power to our building was out and I needed to go in to the office. The power company guarantees us 99.9% up time, and we have a generator, not to mention various UPSes, so I had no clue how the power could be out.

Well, on the way in to the office, I saw that someone who had been partying way too much way too early in the day had driven into a transformer, which took out the power to the entire block where my office is located. (I guess the power company didn't take DUIs into consideration.) The generator failed to start, and of course the batteries didn't last the six hours it took the power company to replace the transformer. I was in the office until close to 10 that night."

My take: At least you had a plan! So many companies just hope for the best and take an "It can't happen to us" approach to DR. Or worse, they just wait for a disaster to strike and then try to put something in place. How often did you test the generator? I'm assuming that it was tested regularly and that this was just a case of a perfect storm hitting.

6. clarkcomputer: Trick or treat

"I was trick-or-treating with my daughter. We were both dressed in Army camouflage. I'm a veteran; she was wearing jungle and I was wearing desert. We both had camouflage paint on our faces. I got called to come into work at our California office because we had lost connection to our office in Panama. No choice but to go into work in desert uniform and painted face. It was a 7x24 call center, so there were plenty of people there to see me. Fun times!"

My take: In some places, this might be considered normal attire, depending on how much you have to battle your coworkers!

7. a.portman: Holiday upgrades

"I work for a for-profit college. We renovated the 4th floor of the building. This meant major electrical work. It was decided that Christmas break would be the ideal time, no students in the building. But I need to be in the building while the servers are running on generator power. They run fine the whole time. Email flows, people at home can work remotely. Our other campus is up and running if largely empty. The generator runs the IT room and its servers and lights. The boiler, elevator, and nonemergency lights, no. St. Louis, MO, in late December is in the low 30s. I only get "Dell" heat."

My take: Ahh... Dell heat. Some of those units are pretty efficient at creating heat! I well know the "Christmas break" aspect of IT in higher ed. It's consistently an opportune time to perform upgrades and do other maintenance, and it's a part of the routine when one joins IT in higher education.

That's important to remember. Sometimes, a holiday interruption might be viewed as an intrusion, but different industries have different high and low periods. The low periods will often correspond with this kind of maintenance work.

8. a.portman: New Year's Day on call

"My worst nightmare was actually the night before. Remote support of the cash registers in a bar in a Las Vegas resort on New Year's Eve. Nothing happened, but I wasn't comfortable until dawn."

My take: I used to actively avoid going to certain events at work, particularly if my staff was supporting the event in some way (e.g., providing logistical support). The reason: My stress level would skyrocket and I was uncomfortable throughout the event. I trusted my staff, but every little lighting glitch or sound blip brought a noticeable look (as in, "You blew it") right at me from my boss. Sometimes, the stress level is high just because of the circumstance.

9. aaron: Corporate holiday party

"Last year we had 2/3rds of the company (and my entire five-man IT Team, of which I'm the manager). My boss and others picked on me for bringing my laptop and air card. 'We're on a dinner cruise around Manhattan, lighten up.' As it turned out, we made it through the evening without issue... but had we not, how screwed would I have been?"

My take: First of all, kudos to you for being prepared. But second of all, kudos to you for choosing a company that, at least based on this story, actually seems to care!

10. larrymcg: Y2K

At our company (Tandem, The NonStop Company), all the support folks and managers in development were on high alert and had to stay close to work in case something happened. As we all remember, nothing much happened.

My take: I, too, remember exactly where I was at the stroke of midnight, as we ushered in the year 2000. I was sitting in the data center at Hamilton College in Clinton, NY, awaiting total failure of the phone system, server farm, electrical grid, and traffic lights. I left at 12:05 when everything went just fine. A lot of people like to use Y2K as an example of what happens when IT people overreact. ("But there was gloom and doom about Y2K and nothing happened!!") But I believe that the "non-event" was the result of excellent planning and hard work by a lot of people. The fact that it was a non-event is a testament to the monumental efforts that took place to prepare the world.

Mike: My worst nightmare

"I run a from-home business and I took my family on vacation after Christmas last year. We were away 21 days, and every day I would log into my server remotely to check email, ensure all my clients backed up, etc. Our last day --- a Saturday -- that morning I could not reach my server. I thought it was just my ISP being out or a router issue. I figured since we were flying back that evening and it was a Saturday, it could wait until our return.

We arrived home at 10 PM, and while my wife fumbled for the house keys, the driver from the airport informed us that our front door was open. We proceeded into the house to find that it had been ransacked and all of our possessions stolen. I had just spent about $25,000 on new equipment for end-of-year appropriations and it was all gone. The perpetrators had lived in our home while we were away and not one neighbor noticed (or said or assisted police) anything. They even stole our vehicles as the keys were in the house.

We had to live in a hotel for two months until our home was disinfected (blood borne pathogens). They had totally wiped us out and broke whatever they couldn't steal. It took us about six months to recover. To add insult to injury, the defendants were juveniles and only got six months probation for over $160,000 in damages and loss, along with a piddly $1,000 restitution fine that if they don't pay, it will go away when they are 21. We are terrified to ever go on vacation again."

My take: Mike, my sincerest sympathies go out to you and your family on so many levels. While I cannot possibly comprehend the sense of invasion you must feel, I can see why you're terrified to go on vacation ever again. I wanted to include your story here as a warning to others: I doubt that you announced to the world via social media that you wouldn't be home. But it's amazing how many people use Twitter and Facebook and inform would-be thieves that they'll be away. There are stories out there about increasing use of these tools by burglars, so watch out! I hope that you and your family are able to return to some semblance of normality.

Summary

In the world of IT, we always face the risk that our vacations and holidays will be interrupted by work. This schedule might vary between verticals, but the risk is there. However, the real problem comes when you're never able to get away without being interrupted. Time to rest and recuperate is critically important to a person's well being and ability to contribute in a positive way to the organization. If your story was a one-off, good! If it's part of a recurring pattern, analyze why and make sure you find ways to get the breaks that humans need.

About

Since 1994, Scott Lowe has been providing technology solutions to a variety of organizations. After spending 10 years in multiple CIO roles, Scott is now an independent consultant, blogger, author, owner of The 1610 Group, and a Senior IT Executive w...

10 comments
flotsam70
flotsam70

I scheduled a vacation day for moving from a rental to our first house (it came with a mortgage :P). Unfortunately, our rental was within easy walking distance from work and the morning of the move, my supervisor came knocking, saying the Internet was down. Turned out my senior (age-wise) colleague who was supposed to act as my backup didn't know the difference between power-cycling and hard resetting a router/gateway/firewall. Sigh... Thankfully, I had a recent backup of the router configuration, so it was a relatively minor distraction. By the way, this is not a case where the backup "technician" became one by default/accident. This person has a graduate-level computer-related degree.

phudson38
phudson38

Thank goodness I wasn't considered essential personnel. One evening a sewage pipe on the first floor broke. Nobody was around all weekend so the flow just kept going. When we walked in Monday morning, the smell was horrible. We found where the sewage broke through the suspended ceiling and traced the pipe back to the break. The clean up was nasty but we finally got management to understand why it wasn't a good idea to route that sewage line over the computer room.

sparky
sparky

We found out later that the generator's fuel filter was gummed up for some reason, so that's why it wouldn't start. And since then, yes. It has been religiously serviced and tested.

alexisgarcia72
alexisgarcia72

Working in IT for more than 15 years, help me to understand the nature about the vacations, christmas and any other days off for IT Manager and admins. Every single time I go to vacation, I need to get my laptop with 3G internet connection with me, just in case. I had 3 or 4 ready-spares computers at the office so if something happens with a user PC, I instruct the guy at the fileroom just to replace the computer. I have almost everything redundant in the office (just in case) - riverbed accelerators, vmware, dhcp, dns, ad, BES. Again, just in case. I had vacations where I was in another state and the air conditioning system in the server room fails. I need to go back to the office just to reset the HVAC system (sunday night) so now, when I go to vacations, I have spare air systems. The same with the power in our building, we have a Big Symmetra UPS and I configured notifications so when power go out, I get notified. If power is no back in some time, I'm able to connect remotely and shutdown non critical equipment. This allow me to have batt time for servers and network equipment for about 6 hours. If power do not return for some causes, I will need to shut down everything and users will need to use Message One from Dell. This never happens but you need to have a plan for everything.

brettkruger
brettkruger

Wasn't a DUI that took out power but a hot Australian summer day caused the local transformer to overheat and catch fire. We also have a backup generator and UPS in the server room, however one of our outdoor staff had decided after the last generator test to unplug the generator battery recharge unit and plug in a cordless drill charger. The generator didn't start and the UPS ran down. Needless to say the fix was a steel cage around the generator battery charger!

Alpha_Dog
Alpha_Dog

... with the ironic exception of Y2K. Good thing I celebrate on the 21st or so. What moron thought that using the holidays at year's end is a good time to do major upgrades? Yes, I understand the whole write off business expenses thing before year's end, but tied to that is everyone else's crunch time with reduced manning. If anything goes wrong, IT and key department people are going to be working some long hours with disappointed children. A little foresight earlier in the year could solve this issue. To counteract it, for the months of November and December our organization has an upgrade freeze. Hardware, infrastructure, and mission critical software do not get touched other than backups and regular maintenance until January. Does it work? Sorta. We quit working the holidays internally, but it would seem that many of our clients have not received this memo.

steve
steve

A few years ago my boss had a spare ticket for a New Years Cruise. While I normally prefer the week of him away as an extra vacation, why not. The day we were to go to Haiti the weather was miserable, so we just cruised slowly past. His Cell (yes he had it on) goes off and the print server is down. I turn my phone on and head to the computer center. 1 hour later they are back up, yes you do have to yell from that far away. Stress on a cruise, oh and one hour of cruise ship cell and computer time, that was about $600.

maj37
maj37

The one about the cold Christmas reminded me of one at a university in Louisiana back in the 80s. The state was suffering from severe money issues and was forcing all state agencies, including our university to cut budgets. To save money since no one was there for the holidays they shut off all of the HVAC in the administrative tower where we had the entire basement for our data center. Then Mother Nature took over and it got so cold the water in the HVAC pipes in the library on the second floor froze and burst the pipes. After the thaw of course water started flowing down to our floor. Fortunately the break was not over the computer room but there was standing water from the hall running into the raised floor through the door, the operations supervisor used boxes of punch cards to build a dam to stop the water from going much past the door. My office was the low corner so I had about an inch of water standing in it and anything on the floor was ruined.

jcbronson
jcbronson

I wish I had seen the original call for stories. It seems that sparky and I are comrades-in-surprise. Exactly one year earlier, I had nearly the exact same story. A DUI took out a transformer on Christmas morning. Our UPS maintenance had lapsed and one bad cell vented. When power was restored, several blades balked after the un-conditioned jolt made it past the failed UPS. Thankfully I wasn't the only support working on the holiday. Perhaps we will meet one day and share a beer over the coincidental stories.

jack.roberts
jack.roberts

As jcbronson's co-worker, I too wish that I had seen the original call for stories. Chirstmas time, 2008, my Father-in-law passed away on December 23rd. On Christmas day, we were preparing for a rather somber Christmas dinner. That's when I get the call that we had the power failure at work. To say the least, my wife was extremely not happy that I had to leave, just before my siblings arrived. Thanks to my kind co-worker, JCB, he let me go back to my family while he worked most of the day to get our systems back online. I can't say enough about my co-workers and their dedication when an emergency occurs or a major project is due, like moving the entire company 50 miles away. Merry Christmas, to all. JR