After Hours optimize

10 stupid things people do in their data centers

Small missteps can turn into huge problems in the data center -- and that can mean big trouble for your organization (and for you).

We've all done it -- made that stupid mistake and hoped nobody saw it, prayed that it wouldn't have an adverse effect on the systems or the network. And it's usually okay, so long as the mistake didn't happen in the data center. It's one thing to let your inner knucklehead come out around end user desktop machines. But when you're in the server room, that knucklehead needs to be kept in check. Whether you're setting up the data center or managing it, you must always use the utmost caution.

Well, you know what they say about the best laid plans... Eventually you will slip up. But knowing about some of the more common mistakes can help you avoid them.

1: Cable gaffes

You know the old adage -- measure twice, cut once. How many times have you visited a data center to see cables everywhere? On the floor, hanging down from drop ceilings, looped over server racks and over desks. This should simply not happen. Cable layout should be given the care it needs. Not only is it a safety hazard, it is also a disaster waiting to happen. Someone gets tangled up and goes down -- you run the risk of a law suit AND data loss, all because someone was too lazy to measure cable runs or take the time to zip tie some Cat5.

2: Drink disasters

I know, this might seem crazy, but I've witnessed it first hand too many times. Admins (or other IT staff) enter the data center, drink in hand, and spill that drink onto (or into) a piece of equipment. In a split second, that equipment goes from life to death with no chance for you to save it. Every data center should have a highly visible sign that says, "No drink or food allowed. Period." This policy must be enforced with zero tolerance or exception. Even covered drinks should be banned.

3: Electricity failures

This applies to nearly any electricity problem: accidentally shutting off power, lack of battery backups, no generator, pulling too much power from a single source. Electricity in the data center is your only means of life. Without it, your data center is nothing. At the same time, electricity is your worst enemy. If you do not design your electrical needs in such a way as to prevent failures, your data center begins its life at a disadvantage. Make sure all circuit breakers (and any other switch that could cause an accidental power loss) have covers and that your fire alarms and cutoff switches are not located where they might tempt pranksters.

4: Security blunders

How many keys to your data center have you given out? Do you have a spreadsheet with every name associated with every key? If not, why? If you aren't keeping track of who has access to the data center, you might as well open up the door and say, "Come steal my data!" And what about that time you propped the exit door open so you could carry in all of those blades and cable? How much time was that open door left unattended? Or what about when you gave out the security code to the intern or the delivery man to make your job easier.... See where this is going?

5: Pigpen foibles

When you step into data center, what is your first impression? Would you bring the CEO of the company into that data center and say, "This is the empire your money has paid for?" Or would you need a day's notice before letting the chairman of the board lay eyes on your work?

6: Documentation dereliction

How exactly did you map out that network? What are the domain credentials and which server does what? If you're about to head out for vacation, and you've neglected to document your data center, your second in command might have a bit of drama on his or her hands. Or worse, even you've forgotten the domain admin credentials. I know, I know -- fat chance.  But there's this guy named Murphy. He has this law. You know how it goes. If you're not documenting your data center, eventually the fates will decide it's time to deal you a dirty hand and you will have a tangled mess to sift through.

7: Desktop fun

How many times have you caught yourself or IT staff using one of the machines in the data center as a desktop? Unless that machine is a Linux or Mac desktop, one time is all it takes to send something like the sexy.exe virus running rampant through your data center. Yes, an end user can do the same thing. But why risk having that problem originate in the heart of your network topology? Sure, it'd be cool to host a LAN party in your data center and invite all your buds for a round of CoD or WoW. Just don't.

8: Forgotten commitments

When was the last time you actually visited your data center? Or did you just "set it and forget it"? Do you think that because you can remote into your data center everything is okay? Shame on you. That data center needs a regular visit. It doesn't need to be an all-day tour. Just stop by to check batteries, temperature, cabling, etc. If you fail to give the data center the face time it needs, you could wind up with a disaster on your hands.

9: Tourist traps

You're proud of your data center -- so much so, you want to show it off to the outside world. So you bring in the press; you allow tours to walk through and take in its utter awesomeness. But then one of those tourists gets a bit too curious and down goes the network. You've spent hundreds of thousands of dollars on that data center (or maybe just tens of thousands --- or even just thousands). You can't risk the prying eyes and fingers of the public to gain access to the tenth wonder of the world.

10: Midnight massacre

Don't deny it: You've spent all-nighters locked in your data center. Whether it was a server rebuild or a downed data network, you've sucked down enough caffeine that you're absolutely sure you're awake enough to do your job and do it right. Famous. Last. Words. If you've already spent nine or 10 hours at work, the last thing you need to do is spend another five or 10 trying to fix something. Most likely you'll break more things than you fix. If you have third-shift staff members, let them take care of the problem. Or solve the issue in shifts. Don't try to be a hero and lock yourself in the data center for "however long it takes." Be smart.

Other mistakes?

Have you ever witnessed (or been part of) a data center disaster triggered by a silly mistake? Share your experiences with fellow TechRepublic members.

Also read...

Automatically sign up for TechRepublic's 10 Things newsletter!

About

Jack Wallen is an award-winning writer for TechRepublic and Linux.com. He’s an avid promoter of open source and the voice of The Android Expert. For more news about Jack Wallen, visit his website getjackd.net.

67 comments
michael_boardman
michael_boardman

I was once responsible for a small network. The area responsible for telecomms wanted to make some changes to their cabling going through our server room.


We said "Fine", and took them into the locked room so they could assess the setup. While there, we pointed at the power cable for the server and said "Please don't unplug that".


When we came in the day after their work was done overnight, the network was down. No, they hadn't unplugged the power: they had CUT the cable about 18 inches from where it came out of the computer.


Livid? You bet I was!

chewie2
chewie2

Been on both ends of the cabling 'rats nest' syndrome - having to make sense of an existing mess and then being the one who created one. I ended up being sure I documented what was there in the part of the network I touched (happened to be the heart of the optical interconnections for most other mainframe and networking systems). As a contractor, I made the first install look very neat - but by the time the 10th + 'modification' was finished, things were looking very sloppy. Since all this was perfomed 'live', you don't have the option of neatness. I always documented the resulting networks - but seldom had anyone in the 'client' shop to give this documantation to. Big problem - Clients who contract out important work, but don't have anyone who then knows what was done.

pjboyles
pjboyles

The data center was undergoing a power upgrade. It was meticulously planned and contracted out to an experienced electrical firm. So the electrician went to power down the "A" side of the redundant power and somehow pulled both "A" and "B" circuits. So several days later after recovering all servers, the process for access to the power grid management changed to electricians must be escorted by a knowledgeable Data Center employee to prevent pulling both circuits at the same time.

cmiglior
cmiglior

I remember.. i went in a branch company data-center. There were two servers; in one of them (the domain controller) a little software update required a reboot. I, as the group CIO, confirmed the reboot..... three days of desperate hard work for restarting the lan access.... an amazing server restart.... Rule: never perform a restart on server on which you haven't a deep knowledge..

Trilln451
Trilln451

My 2 cents: This is something related to me by my boss, concerning an IT friend who was doing a system upgrade when Something Went Wrong. I think this was on a Sunday & he was under the gun to be ready for Monday. Well, when the upgrade stumbled, he decided he needed to do a system restore. Now it's a little blurry for me here because the was a while ago & I'm not an expert, but it had something to do with the system being a mirror raid, & he thought the good backup was the twin drive, except that that drive had actually mirrored the botched upgrade & the backup was hosed too...something like that. Long story short, total cockup. BIG data loss. Very bad horrible not good. I don't know if he was fired or they just took him out back & shot him. Oh - my point (& I do have one) is that his problem was that he didn't stop to think before he went to try & restore, because it wasn't too late to stop right up till he went to reboot. If he'd stopped to check himself, he could have at least saved the data before trying to straighten out his installation problems. I'm sure most of you probably know what happened better than I do, if you care to comment...

michael.moore
michael.moore

I worked in a Defence environment where rack layout planning was non-existent. Over the years they'd just aded and added and added rack upon rack. At one stage they needed to quickly replace a server only to find they didn't have enough space to open the door widen enough to get it out. No problem....take off the door right? Except there still wasn't enough room to fully extract the server from between the adjacent servers as it hit the racks in front of it and behind it. To get it out they had to shift not one but two fully loaded, full height racks and due to the sheer weight they had to take out over half of the servers in those racks before even a trolley under them could shift them! I might add once done they then just put them all back in the same positions rather than try to fix the problem!!!

mjlas01
mjlas01

Cable management is very important. Once some of our engineers was doing a cleanup in the center, and one of them dropped a heavy box on top of a fiber splice which was laying on the floor instead of being secured properly. Bottom line: damaged fiber and it took down several critical application servers for a short period.

mjd420nova
mjd420nova

Came with a handle called a "torque amplifier". The worst I've seen was a clean up crew with a bad motor on a vacuum. Plugged it in the outlet in the hall to clean inside and wouldn't work (tripped the breaker) and came inside to find another outlet. All the empty sockets have slugs in them to prevent anything from getting plugged on certain circuits. He searched and found a power strip under a desk. As fate would have it (MR. Murphy had a hand in it too) the strip was daisy chained (twice). One of the pop out breakers on the side of the strip failed and allowed the breaker in the distribution panel to blow. After he disappeared, the on site IT guy appeared and walked into a totally silent server center. Repeated resets of the main breaker just tripped right back out. I got the call as I was on site already on another call. I had to take the power strips out of the problem as a first step because they've given me trouble before. Once power was on the feeder without tripping, I was able to find the shorted power strip with an ohmmeter without dumping what was already online again. Two lessons learned. NEVER, NEVER daisy chain power strips and two, don't trust the power strip to be the weakest link, back it up at the distribution points.

333239
333239

"Unless that machine is a Linux or Mac desktop, one time is all it takes to send something like the sexy.exe virus running rampant through your data center." If you think Mac and Linux desktops are immune to viruses, then you are a liability.

fredden
fredden

No Unaccompanied Managers A small variation on the pigpen foibles, ban all managers who suddenly remember at 5:05pm a visitor walk-through scheduled for tomorrow morning, and who take it up on themselves to collect up every loose box / piece of kit you were working with and fire it through the storeroom door from 10 feet back. This has happened 3 times to me, caused the loss of 25K worth of license keys and one time had me crawling through the skip outside to rescue some essential fibre cards the next day. I am happy to clean when given reasonable notice, but a lack of planning on someone else's part doesn't justify taking over my area. 12 hours bottle to throttle Happy hours at work are dangerous enough, but include shift operators / support technicians in that social mix and you are asking for trouble. Even off shift, they can try and "help" the current shift and end up trashing the system. Briefing Briefing Briefing So I'm working on a console and the fire alarm (accompanied by Halon klaxon) starts off, so I exit past 2 other technicians saying "fire! exit! NOW!", who promptly ignore me and carried on working. Fortunately the Halon system was empty but they had not been told we even had one installed. I don't have time to quote chapter and verse if there is a crisis, anyone inside the data centre needs to understand the risks and appropriate actions. Elfen Safety Despite the best efforts of some managers, a Data Centre is a workplace, where work gets done. Sometimes this means things can be left half-done (floor tiles up, cabinets open) while we attend to other things. Having been an engineer in an engineering workshop, I know about safety, and the best safety measure is more than 2 braincells in the cranium of the human. Look, listen, think! Summary The Operators manage the Data Centres Day-to-Day but I make sure all involved (Project Managers, Ops, Developers) understand it is MY! Datacentre, they are just Guests. May sound overly authoritarian but stops lunatic things happening.

carmike
carmike

This also happened late 80's in the Main PC Room (3 Mainframes (Perkin Elmer ).An operator was bringing more boxes of Lineflow printer paper into the area on a handtruck and of course it was overheight and over weight and of course the person tripped and the paper boxes tumbled down. 1 box rolled over several times and hit the main UPS breaker. Instant Blackout of Mainframes/DiskDrives/and the lights.The only saving grace was it was an closed /non Betting day and loss of network had no effect. Took 4 hours to safely do a staged restart of all equipment. Needless to say 2 things happened 1 was a wall and door were erected around the AC power reticulation and (2) paper was only allowed a box at a time. Period... into the area.. We Had Excellent protection from (Built in early 80's)Fire/Famine/Flood /Earthquake/Etc, but the human factor nailed us that day.

alan.douglas
alan.douglas

Sorry, guys (and gals), but everything except #1 in the list is common sense in any datacentre I've ever been associated with. I only mention #1 (cable tidiness) because it's the hardest one to enforce - there's always some cowboy who strings a cable "temporarily", but forgets that "temporarily" is just a synonym for "permanently". ("If it ain't broke, don't fix it.", right?) These 10 things might apply in some tiny environment where the "IT Manager" is the person that's been pigeonholed to "look after the equipment", which consists of a couple of servers, a printer, a fax machine and an electric pencil sharpener. The comments following are far more interesting (and useful) than the article.

robo_dev
robo_dev

A worker brought his toddler to the data center who could not resist the big pretty red button. A service tech changing an air filter to the VESDA (smoke detection) system, caused full-alarm, did a EPO for the entire SAN for five mainframes and 45,000 users. A missing support strap on a water catch pan in the data center ceiling allowed the pan to flex, so as the weight of the water increased, the water detection sensor got pulled farther and farther away from the pan as it filled with water. The sensor was in the middle of the pan, and should have tripped if any water at all was in the pan. The water detector sensor stayed dry as the pan buckled, dumping an estimated 200 gallons on top of a production SAN system, which did not play well with the electronics whatsoever.

peteystock
peteystock

As another variation on the A/C issue, make sure you know where the A/C resets are on the air handlers. At one company I worked at, we found out after a scheduled building power outage that there was an additional reset we had to engage in order for the pumps to properly start recirculating again to pass the chilled water through the unit. Fortunately this was not our processing data center but it wasn't nice to see the temps in the brand-new server room go over 100.

pvdcats
pvdcats

They wouldn't let us buy any sort of lock for the server room or buy a UPS. This was a big DEC VAX system. One evening, the cleaning crew with a floor polisher was scheduled to wax and polish the floors of the server room. On going into the room, he unplugged the VAX to run his floor polisher. No one was amused and the next day a proper lock was put on the door and a UPS purchased.

acampo
acampo

A new fire system company in to test and forgets to disable the EPO to your datacenter.

pwitt
pwitt

Maybe this is a given, but controlling the climate is pretty critical for the company i work for.

luriep
luriep

Just remember, there's no good reason to build a data center below the level of the water table. I don't care how many drains, how many pumps, or how many backup generators you have. Someone may say they don't "have the budget" to put that new data center in the ground floor vs. the basement (or 2nd floor!). Just remind them that there's [i]never[/i] any IT budget until CNN shows up with their TV cameras. Then budget problems go away. :)

wrparks
wrparks

If you AC units can leak water in the data center, do you have a dam, emergency pump, and auto detection system? I watched a system go down because the AC drains clogged and water backed up and spilled into the data center. Literally trying to stem the tide with buckets, and wet vacs. Have a plan.... yes AC is supposed to drain but it becomes YOUR problem when it doesn't.

desbromilow
desbromilow

A data centre i worked in had the emergency release (power and Halon/Inergen) button right near the exit doors, and right next to the doors was the button to let yourself out. It only took one person to ask "how do I get out?" and one operator to reply "Just push the button near the door and it will open" before the Halon button was covered with a perspex sheild - and appropriate signage.

deshar2012
deshar2012

Check that the system backup has been run. More than one internet transaction system has crashed after its first 6 or 9 months of operation to discover no-one had been taking backups.

Pete UK
Pete UK

I have a confession. It's long enough ago now. And it is true. During my pre-sales training with a leading global IT vendor in the mainframe-dominated days around 1980 our class of about a dozen was clustered around a communications controller in the data centre, while our instructor explained what was going on on the small display screen on top of the washing-machine-sized appliance. As he finished, I backed away and the screen went dead. Along with those of about 200 users scattered around offices in North London. Too late I realised what that little tug was I'd just felt below my belt. The fly flap of my suit trousers had caught over the big red emergency power off flange while I'd been standing there, and when I walked off, the power went with me. Takeaway: humans are dangerous. Keep them out of DCs.

umaira09
umaira09

Most stuff I read is not core data centre related, I teach data centres for a living but I don't see (for example) anything related to cooling and yet it is causing substantial downtime in a data centre. Not to mention, accidents with fire detection and/or suppression equipment.

fiosdave
fiosdave

Make sure there are no conflicts with rack fans-ceiling fans, etc. I have seen installations where there are fans drawing cooling into a rack, while other fans in that rack are exhaling. It is possible to get heat trapped between the fans and cause large hotspots near sensitive equipment. Remember, heat RISES!

fiosdave
fiosdave

Depending on how tight you want your security, have access control to your sensitive areas. Retinal scans, fingerprint recognition, keycards, physical security or digital locks (not necessarily in that order!) Change codes/passwords when personnel leave the company (for any reason!) and/or at irregular intervals.

fiosdave
fiosdave

Run a simulated emergency drill at irregular intervals, to make sure that personnel know how to respond, carefully and quickly!

fiosdave
fiosdave

You should probably have a cabinet, centrally located, that contains necessary tools, fuses, etc. If the tools are expensive, you may want to tag them and have a detector at all doors, to alert security that someone "forgot" to leave that expensive analyzer where it belongs!

fiosdave
fiosdave

Make sure that all cables longer than 6" (or whatever your standard may be) are labeled at BOTH ends. It's also wise to document all cables, and keep the documentation up to date!

fiosdave
fiosdave

How many techs do you know that carry at least one Greenie in their shirt pocket? This is a disaster waiting to happen. All a tech has to do is climb to the top of a rack to inspect something, and the tool falls out of his pocket, down into the rack, causing brilliant display, or dead silence. I've seen this happen to a tech that was inspecting a DuraKool relay panel, dropped his greenie into the panel and took down a major portion of the installation for 20 minutes! For those unfamiliar with these Xcelite products, their small tools are color coded. Search under Xcelite, or Cooper hand tools and you will see the wide color-coded variety of tools! I, personally, own a red, blue, green and yellow set. These are standard tools in the electronic maintenance arena, including "orange sticks", which are wooden sticks, about half the diameter of a pencil, with flat ends. These are used where a metallic tool might cause a short. It would be interesting to hear what other items are industry-centric.

Pete6677
Pete6677

If the data center has redundant power, it definitely needs redundant cooling!

dennylutz
dennylutz

I used to run a server room with "broken" AC. Every day three days, one of our guys job is to change out the drip pan!! Finally, one day he forgot and flooded the whole server room. Of course, we didnt have any environmental morning system.

dave.hunt
dave.hunt

... the company that put full air handling and chilling in the server room but "forgot" to put chillers in the UPS and telco rooms? Ahhh - the smell of boiling batteries! Can't smell anything now though!!!

abaum
abaum

The city safety inspector came for his yearly certificaion of the data center. He went about his business of checking the fire supression systems, and other things his official company escort had no interest in. So the escort decided to clean up some cabling on a newly installed server. The inspector finished and went to the door to leave. The door requires badge access to deactivate the magnetic latch that locks it. The inspector saw the red button on the wall labeled EPO and figured it was the button for the door. This emergency power off button did what he wanted, the door opened and all of the power in the data center came crashing down. After countless hours of bringing up servers, running fsck on all the storage, and getting the air conditioning and networking going again, somebody decided to put a plastic cover over the switch and a proper label in big red letters.

dragan
dragan

I am working in big goverment organization and daly backup is automated. In organisation units we have 5 to 10 tapes, and in central server room 30. Tapes are used in circle. There is no monthly or year backup set. I have never heard that somebody try to restore any tape, but backup is working and working ... HORROR

pjboyles
pjboyles

A simple server reboot should never cause extended systems outages The design should account for reboots and hardware for failures. That the services and software did not automatically restart after a reboot exposed a serious flaw in setup or design.

fiosdave
fiosdave

"crawling through the skip". I'm sure all the Brits understand that, but we on the other side of the puddle may not!

fiosdave
fiosdave

This reminds me of another problem we had with a floor polisher. It seems that someone decided to store our current backup tapes in the bottom drawer of a steel file cabinet. One evening, the floor polisher guy came right up against the bottom of the file cabinet (as proven by the scratch marks on the paint) and degaussed all the tapes at once! Those heavy duty polishers generate quite a magnetic field!

kitty.evans
kitty.evans

I once worked at a data center (during mainframe days) where the snow melted and flooded the computer room subfloor. Everything had to be shut down until the problem was resolved.

Arctic.Moet
Arctic.Moet

...were you immediately busted as the culprit? I mean, was it blatantly obvious that you were the cause of the outage? Personally, I would probably either faint, make a lame joke, or blame the person standing next to me. Great story!

sslevine
sslevine

...too funny! At least after the fact!

333239
333239

Turns out I had one in my drawer for years, it's a little screwdriver made by American company Xcelite, and yes it has a green handle.

NickNielsen
NickNielsen

Never heard that term before in this context...

hillelana
hillelana

How about an afternoon system?

Dr. Solar
Dr. Solar

I was just visiting a data center where the big red button by the door really was just to open the door. You can't imagine the shudder I felt when the escort reached out and hit that button.

sslevine
sslevine

You can't make this stuff up . I really happens. :-)

333239
333239

Archive backups are very important. I've seen it a few times where people think stuff is backed up, but what if that accidentally deleted file is not noticed until all the backup tapes have looped round? then it is gone forever. Again, as dragan says, the only way to check if a backup system is working is to RESTORE.

aidemzo_adanac
aidemzo_adanac

Dumpster diving. A skip is a dumpster in Europe. It comes from the boats called 'skips', apparently a Norse term for large boat but is more common one of those small, square aluminum boats that actually look like a dumpster. Just a utility boat, almost like a WWII landing craft.

NickNielsen
NickNielsen

That tool is referred to as a "7-level" in the USAF. The only reason I use that tool today is to release cable latches!