How Do You Recover From A Network Outage?

By GreenLED ·
Recently, I experienced a network outage at a building I do contract work for. I had a number of horrible issues to deal with that I won't go into here. I wanted to get some opinions and thoughts on people that have either dealt with an outage (as a result of a storm or some other electrical issue). I want to get their story on how they went about troubleshooting the problem. I have (almost) completely recovered the network in question, but I want to be more proactive should this occur again.

This conversation is currently closed to new comments.

Thread display: Collapse - | Expand +

All Answers

Collapse -

Instigate a Good DR Plan

by OH Smeg Moderator In reply to How Do You Recover From A ...

That is the easy answer. Things like Exchange Servers don't like having the power removed for any reason and can be a nightmare to get back up and running again.

Honestly the easy way is to have UPS on all critical Hardware and any connected to computers need to Autoshutdown the computers that they are connected to before the batteries can go below the threshold to run whatever it is that the thing is connected to.

Things get nasty when you have to shut down Routers & Switches in a controlled manner with no one there though.


Collapse -

DR!!! Disaster Recovery Required Unless You Like Hours of Anxiety

by drumright In reply to How Do You Recover From A ...

I have implemented a procedure for my clients and our own sites that include:
1. Outside monitor (external monitor running from a mail filter host, or a free monitor hosted by monitoring providers who want your business and will give you this free.)
2. Set the external monitor to ping (all hosted external ip addresses you use for website or exchange, ftp, etc. try to use the free monitors ip address on your firewall rule to allow icmp echo request/replys or the monitor will give you a port they use to run this. Some sites give you a utility that is installed on a dedicated workstation or server that will then send hello statements or active statements at dedicated intervals. after 3 unsuccessful attempts the monitor site will email, text, whatever you select as options for notification.
3. Setting #2 is important as once you have a power failure it usually means you cannot reach the internet from the internal LAN.
4. Like OH Smeg stated UPS (I suggest Smart UPS! its worth its weight in gold (and some are quite heavy!) We use a few Rack mounted Smart UPS (APC) which will log into the servers after a power outage occurs and will shut down the servers the correct way. They will then log the event in a designated log file location, if you still have internet as you still have power they can also send you a alert letting you know of the issue some can call your phone and leave you a message.
5. Once all Servers are shut down correctly by the smart ups, you really have nothing to do until power comes back on, which in some cases you can set triggers on the ups that will then if the power for the internet device is on/or was not shut down will send out another alert letting you know power is up.
6. I also implemented ILO or DRAC for all servers I manage. I then login externally to the DRAC or ILO IP address for each server, power the servers back up, log reasons for shutting the server off (if APC hasn't already logged it.)
7. Once all servers start up, I check event logs, connectivity internally, external vpn connection, websites turned on, and make sure my backups either A. restarted from when they were stopped due to shutting servers off, B. or start them manually do to a failure.
8. Backups are important "THE MOST CRITICAL APP/SERVICE/STANDARD that should be implemented in your environment.)" Get these running pronto! (if on non production time)
9. Last but not least, check mail server if you host your own, make sure services started up properly, make sure your store is mounted, check mailflow by sending an email to your gmail account or another external account you use. Then send out a company wide email letting users know there was a power outage and if they were running services, app, file transfer, download, or offline file synchronization at that time to re-check and re-run.
10. sit back and smile as you did this all from the couch in your bathrobe with a cup of coffee in one hand, and your kids watching cartoons on the couch with you while you save the company from disaster :)


Related Discussions

Related Forums