Last week I deleted a few months’ worth of customer data. It wasn’t one of those sudden rm –rf ~customer moments, where the sinking feeling hits a split second after your little finger leaves the Enter key. The data loss wasn’t even noticed until almost 24 hours later, but the sinking feeling and accompanying nausea hit quickly after discovery. I was able to recover the lost data after a week of 20-hour days.
Unfortunately, this isn’t the first time in my eight-year career as a computer professional that something like this has happened. Six years ago, I caused a similar deletion due to inexperience. This time around, the cause was exhaustion. Between the two incidents, I’ve collected a fair bit of knowledge about what to do after doing the unthinkable, and I’ve distilled it into some disaster-recovery tips.
Don’t panic
InThe Hitchhiker’s Guide to the Galaxy, humorist Douglas Adams offered this all-important piece of advice: “Don’t panic.” This is never more applicable than right after you’ve trashed a production database. Immediately after you’ve deleted a great deal of data, you’re in a better position to repair the damage than anyone else. Do a good job with the recovery and you can all but erase the blemish on your record that is the original error.
Be forthcoming with your manager
No matter how great the temptation—and I can assure you it’s very strong—don’t attempt a cover-up. Even if the missing data goes unnoticed, your boss will surely see the schedule slippage in your regular work due to your clandestine recovery efforts. Don’t hide behind e-mail or voicemail when coming clean. As soon as you’re able to do so, speak directly to your manager and explain what happened, how it happened, and when you’ll have a recovery plan ready.
Back up immediately
It might seem like closing the barn door after the cows have escaped, but you should make an immediate backup. When you’re working on a crippled system and doing so in a frazzled state, it’s extremely easy to make things worse. Your employment status may survive one big error, but two of them in close proximity certainly won’t help your case. Besides, knowing that you can return to the post-disaster state at any time allows you to be more imaginative in your recovery efforts. If ever there was a time to rig up another safety net, this is it.
Start a log
Odds are you’ll be working on this recovery to the point of fatigue for the next few days or longer, and the more you record about your efforts, the better you’ll be able to avoid wasted effort. Jot down alternate strategies. Document failed attempts. Take note of any oddities you encounter. Meticulously record your progress. If your recovery efforts succeed, you can make your logbook a monument to the success you’ll never get to celebrate. If you’re unable to recover the lost data, your logbook can help you figure out where you went wrong, or give a head start to anyone else who tackles the problem.
Keep your manager informed
Your manager is probably as worried as you are. Couple that with the fact that his or her faith in your abilities is likely at an all-time low and you’ve got one very stressed-out boss. If status reports are the currency of the developer-manager relationship, it’s time for you to pay up. Provide your boss with regular updates on your progress and plans. If possible, give him or her access to the logbook you’re keeping. It’s time for you to start the long process of regaining your boss’s trust, and you can begin by playing the part of the consummate professional.
Consult with your manager before taking any nonreversible recovery actions. Again, this is both a confidence-building move and a self-preservation strategy. Management signoff will at least help protect you from any fallout that comes from steps taken during the recovery process.
Write up a postmortem
After your recovery effort is done, produce a detailed postmortem document. Describe what happened, how you discovered the problem, how you diagnosed its exact nature, the recovery strategies you tried, and the final outcome. Include a section explaining how policy or practice can be changed to make similar catastrophes less likely in the future. Be honest about your initial error, but don’t sell yourself short when explaining your recovery heroics. Your manager will appreciate a level of disclosure seldom seen from software developers, and you’ll be spared the retelling of a painful story to each of your coworkers.
Disaster recovery
How did you survive a self-inflicted disaster? Post a comment below or drop us a note.