By Mike Talon
Developing a disaster recovery plan is only the first part of establishing an organization's overall DR strategy. Once you've put the systems in place, you need to make sure they're up and running on a regular basis, or you risk the failure of the DR plan during an actual disaster.
The fallacy of the "set-it-and-forget-it" approach has led to more data loss than any other potential problem normally associated with DR solutions. Many software and hardware vendors make incredibly reliable solutions and systems. These DR methodologies have great success rates when it comes time to fail over during an emergency.
But no solution will work flawlessly during a disaster if you simply implement it and then ignore it for a year. And it's usually through no fault of vendor-provided solutions—you can't expect a DR plan to be successful if you haven't properly maintained and tested it.
You should perform the first round of testing during the installation of the software and hardware tools. Since you probably schedule downtime to perform the installs and configuration, this is a great time to test the systems.
Restore data from tapes, attempt to fail over at least some—if not all—of the data systems in question, run some test transactions, and create new files to ensure everything is working the way you anticipated. Most people test DR systems at the time of implementation, but many then expect the systems to continue working over time without maintenance and continued testing.
But over time, environments change. Organizations upgrade networking pathways and formats, update and patch servers, replace hardware, and make other significant changes as a normal part of server operations.
Because of this dynamic and changing environment, many DR systems can become ineffective through no fault of the solution itself. For example, a security patch or upgrade could block previously allowed IP ports or protocols. This could easily block replication or the ability of some backup agents to transmit data to the tape systems.
Testing at regular intervals can help catch these configuration issues and allow you to both diagnose the cause and create a solution or workaround. But how often should you test, and what methods should you use for testing? That mainly depends on regulations that affect your company and internal DR policies.
Many industries have implemented regulations that require DR solution testing to occur at least once a year. And many times, these regulations specify even more regular testing.
If any of these regulations affect your organization, then you already have a baseline for ongoing DR testing. If not, you should still test DR plans at least twice a year.
Testing can be as simple as verifying restored data or as complex as complete failover to a DR facility for a specific period of time. No matter how often and how you test, you can't just leave your organization's DR systems alone after setting them up.
Servers are dynamic systems, and they're constantly changing. You can't expect any DR system put in place today to function in the same way six months later. Regular testing and maintenance helps ensure that your DR systems roll with the changes to your environment—and continue to protect the company as time goes on.
Mike Talon is an IT consultant and freelance journalist who has worked for both traditional businesses and dot-com startups.