This week we had a true test of our disaster recovery
procedures. At 11am on Monday, water
started pouring out of our comms room ceiling!
True disaster! Luckily the deluge
narrowly missed our server racks but started to pool under the floor and didnt
take long to start spreading across the room.
I instantly started to shut down servers, most critical first, and then
had to cut power to the entire floor via the main breaker. The water kept coming for 2 hours, the floor
was soaked with pools of water underneath the raised floor. We waited to access the extent of the damage
and potential downtime and decided that it didnt warrant switching to our
offshore sitewithin 4 hours of the flooding we had relocated servers providing
essential services to another part of the building, re-routed the internet
links and had all vital services running.
A professional flood recovery team was called in, the source of the
water was found to be a faulty toilet overflow which had been leaking for weeks;
this was pooling underneath the ground floor until it finally found a way to
break down in to the basement! The water
was pumped out, industrial fans and dehumidifiers were set-up and left overnight. By Tuesday morning the room was dry, we
removed each server and checked for any signs of water or condensation, luckily
none were found. By Tuesday lunchtime,
within 24 hours of the incidentservices were back up and running as usual.
We were very lucky that the water didnt come down directly
above our racks, this would have been a true disaster. Thankfully the fact that we shutdown all services
and then cut power to the floor as quickly as possible meant we suffered no
hardware damage or data loss. Phew!
It just goes to show that disasters do happen, so make sure
you have plans in place which you have tried and tested.