Downtime demands multifaceted alerts

Getting downtime messages through

This article was originally published in the Disaster Recovery e-Newsletter.

Recently, while I was holding for the technical support staff of my mobile phone carrier, I finally understood what clients have been asking me for years. When cell networks drop and pagers aren't working properly, how do you get the word out that something has gone wrong? Nearly every tech shop has alerts and messages routed to wireless devices, but what happens when the person in charge can't make a call?

This question has many potential answers. Perhaps the most obvious solution is to ensure that you never have only one person responsible for delivering or receiving the alerts at any given time. Although this method is fine for larger organizations, it isn't feasible for many smaller shops that may have only one technical employee. However, there are other alternatives to make sure your data systems are able to reach out and touch someone.

For example, consider using multiple devices. Most IT pros don't want to look like Batman with a utility belt full of gadgets, but carrying a pager to receive alerts and a mobile phone or other device to make and take calls can be a winning combination. Pagers are not expensive these days, and monthly fees are pretty low in most areas. It's a cost-effective method to ensure that you don't inadvertently get cut out of the communications loop.

Another option is to use software that allows multiple paths for alerts based on urgency and response, such as IBM's Tivoli and HP's OpenView. Myriad choices are available for this type of monitoring and alert software, and many are tailored to specific types of data systems. These packages page backup technical staff in the event that the on-call employee doesn't respond.

In addition, this type of software can page nontechnical personnel if all else fails. Based on the feedback from a previous column, I know that many of you already utilize nontechnical staff members who are trained to handle minor issues in remote offices. These are the people who can be put into the pager rotation to respond if no one else answers the call. Again, this should be considered a last resort.

Although the best solution will vary from one organization to another, events have consistently proven that a single point of failure is never a good thing. That one cell phone you're counting on to send or receive a message could very well become a single point of failure—one that's avoidable with a little work.

Editor's Picks

Free Newsletters, In your Inbox