Facilities management in an IT infrastructure is analogous to the props, lighting, scenery, and sound in a major theatrical production. All the elements must work in harmony for users (the audience) to get the most out of their experience. Avoid these five facilities management mistakes and you can be sure that the IT show will go on.
Mistake # 1: Presuming major components of facilities management are all addressed
If you were to ask typical infrastructure managers to name the major components of facilities management, they would likely mention common items such as air conditioning, electrical power, and perhaps fire suppression. Some may also mention smoke detection, uninterruptible power supplies (UPS), and controlled physical access. Few of them would likely include less common entities, such as electrical grounding, vault protection, and static electricity.
Below is a more comprehensive list of the major components of facilities management.
- Air conditioning
- Electrical power
- Static electricity
- Electrical grounding
- Uninterruptible power supply (UPS)
- Backup UPS batteries
- Backup generator
- Water detection
- Smoke detection
- Fire suppression
- Facility monitoring with alarms
- Earthquake safeguards
- Safety training
- Supplier management
- Controlled physical access
- Protected vaults
- Physical location
- Classified environment
Temperature and humidity levels should be monitored constantly, either electronically or with recording charts, and reviewed once each shift to detect any unusual trends. Electrical power includes continuous supply at the proper voltage, current, and phasing as well as the conditioning of the power. Conditioning purifies the quality of the electricity for greater reliability. It involves filtering out stray magnetic fields that can induce unwanted inductance, doing the same to stray electrical fields that can generate unwanted capacitance, and providing surge suppression to prevent voltage spikes. Static electricity, which affects the operation of sensitive equipment, can build up in conductive materials, such as carpeting, clothing, draperies, and other noninsulating fibers. Antistatic devices can be installed to minimize this condition. Proper grounding is required to eliminate outages and potential human injury due to short circuits. Another element sometimes overlooked is whether UPS batteries are kept fully charged.
Water and smoke detection are common environmental guards in today’s data centers, as are fire suppression mechanisms. Facility monitoring systems and their alarms should be visible and audible enough to be seen and heard from almost any area in the computer room, even when noisy equipment, such as printers, are running at their loudest. Equipment should be anchored and secured to withstand moderate earthquakes. Large mainframes decades ago used to be safely anchored, in part, by the massive plumbing for water-cooled processors and by the huge bus and tag cables that interconnected the various units. In today’s era of fiber-optic cables, air-cooled processors, and smaller boxes designed for nonraised flooring, this built-in anchoring of equipment is no longer as prevalent.
You should include emergency preparedness for earthquakes and other natural or man-made disasters as a basic part of general safety training for all personnel working inside a data center. They should be knowledgeable about emergency powering off, evacuation procedures, first-aid assistance, and emergency telephone numbers. Training data-center suppliers in these matters is also recommended.
Most data centers have acceptable methods of controlling physical access to their machine rooms, but this is not always the case for vaults or rooms that store sensitive documents, check stock, or tapes. The physical location of a data center can also be problematic. A basement level may be safe and secure from the outside, but it might also be exposed to water leaks and evacuation obstacles, particularly in older buildings. Locating a data center along outside walls of a building can sometimes contribute to sabotage from the outside. Classified environments almost always require data centers to be located as far away from outside walls as possible to safeguard them from outside physical forces, such as bombs or projectiles, as well as from electronic sensing devices.
In fairness to infrastructure managers and operations personnel, several of these components may be under the management of the facilities department for which no one in IT would have direct responsibility. But even in this case, infrastructure personnel and operations managers would normally want and need to know who to go to in the facilities department for specific types of environmental issues.
Mistake # 2: Believing that the roles and responsibilities of key individuals are clearly defined and understood
It’s important to identify the key individuals who participate in facilities management, define their roles and responsibilities, and effectively communicate that information. Clearly defining the areas of responsibility and, more important, the degree of authority between these IT and facilities usually is the difference between resolving a facilities problem in a data center quickly and efficiently versus dragging out the resolution amid chaos, miscommunication, and strained relationships.
For example, suppose a power distribution unit feeding a critical server fails. A computer operations supervisor would likely call in electricians from the facilities department to investigate the problem. Their analysis may find that the unit needs to be replaced and that a new unit will take days to procure, install, and make operational. Facilities and IT need to brainstorm alternative solutions to determine each option’s costs, time, resources, practicality, and long-term impact, and all this activity needs to occur in a short amount of time—usually less than an hour. This is no time to debate who has responsibility and authority for the final decisions. That needs to have been determined well in advance. Working with clearly defined roles and responsibilities shortens the time of the outage to the clients, lessens the chaos, and reduces the effort toward a satisfactory resolution.
The lines of authority between an IT infrastructure and its facilities department will vary from shop to shop depending on size, platforms, degree of outsourcing, and other factors. The key point here is to ensure that the two departments clearly agree upon, communicate to their staffs, and ensure compliance with these boundaries.
The mistake arises when one or more of these three parts— identification, definition, and communication—is believed to have occurred but, in actuality, has not. The mistake becomes fatal when the data center and the facilities department each believe the other is following up on an incident, resulting in major extended outages because neither group took action.
Mistake # 3: Thinking that the owner of the IT facilities management process is adequately qualified and trained
One person should be assigned the role and responsibility of facilities management process owner. Our next mistake occurs when infrastructure managers presume that this person is instinctively qualified and trained for the assignment. The mistake becomes fatal if a major incident, such as physical disaster, pushes the person beyond the skill level he or she is able to handle.
The owner of the facilities management process almost always resides in the computer operations department. There are rare exceptions—small shops or those with unique outsourcing arrangements—in which the facilities management process owner is part of the facilities department and matrixed back to IT or is part of the IT executive staff. In any event, the selection of the person assigned the responsibility for a stable physical operating environment is an important decision.
Mistake # 4: Relying solely on environmental monitoring to eliminate supplemental analysis
IT facility managers sometimes believe that the more they automate the monitoring of their data centers and server room, the less effort they will need to expend to ensure stability. This is a natural conclusion but a flawed idea. A patient in intensive care hooked up to dozens of monitoring devices still requires doctors and nurses to periodically check the patient’s vital signs. This serves to record the current condition of the patient and to verify the proper operation of the equipment. Similarly, IT facilities managers need to evaluate, rather than simply monitor, the current state of their data center’s physical environment. The mistake of relying solely on monitoring systems can become fatal if sensors, alarms, or other types of annunciator systems fail during a major disaster.
There are a number of sources of information that can assist data center managers in evaluating the current state of their physical environment. Outages logs normally associated with availability reports should point to the frequency and duration of service interruptions caused by facilities. If the problem management system in use includes a robust database, it should be easy to analyze trouble tickets caused by facilities issues and to highlights trends, repeat incidents, and root causes.
Mistake # 5: Ignoring the nurturing of human relationships
Data center managers sometimes ignore the value of developing strong, personal relationships with key individuals outside of their own departments. These external individuals will vary from shop to shop, but usually include the managers and foremen of the company’s facilities department and representatives of government inspecting agencies. IT managers responsible for data center facilities do not always view relationship building as an integral part of their jobs, focusing instead on the more technical, nonhuman aspects of their work.
This mistake can become fatal if data center facility managers alienate these key external individuals to such an extent that they delay critical physical expansions or upgrades required to sustain a stable operating environment.
Understanding and, more importantly, avoiding these five mistakes can help you sustain the continuous online services of a computer center. This, in turn, can prevent your online performance from coming to a screeching—and unnecessary—halt.
For more information on the Harris Kern Enterprise Computing Institute, visit www.harriskern.com.