Disaster recovery is becoming increasingly important for businesses aware of the threat of both man-made and natural disasters. Having a disaster recovery plan will not only protect your organization’s essential data from destruction, it will help you refine your business processes and enable your business to recover its operations in the event of a disaster. Though each organization has unique knowledge and assets to maintain, general principles can be applied to disaster recovery. This set of planning guidelines can assist your organization in moving forward with an IT disaster recovery project.
Accountability and endorsement
A key factor in the success of your disaster recovery plan will be holding someone on the executive management team accountable. It could be the CIO, CTO, COO, or, if your company is small, the IT director. Whether this person manages the disaster recovery preparations or delegates them to someone else, it will be necessary for the entire organization to know that the disaster recovery preparations are deemed essential by executive management in order to obtain the cooperation you’ll need from all the staff involved. Without endorsement from higher management, collecting all the information you’ll need to make the disaster recovery project a success will be much more difficult. Even if the disaster recovery project is managed by someone who has had the task delegated, upper management needs to convey to the entire organization the importance and essentiality of the project.
Identify and organize your data
One of the first steps in putting together a disaster recovery plan is to identify your mission-critical data. Doing so will help you understand what data you need to back up for off-site storage. It will also prompt you to document why you need this data and plan how to put it back where it belongs in the event of a recovery operation.
Next, instruct your users to assist you in organizing the data in intuitive directories on a central server. Even if you plan to just back up everything, knowing which files are where is a key part of the recovery process. If, for example, when disaster strikes you have customer data spread all over your network on different users' hard drives, finding the data will not be easy. Restoring all the data from backup media is only half the battle. Once the data is restored, you don't want to be walking around the office saying, "Does anyone know where we keep the XYZ contract?" The data must be organized before you back it up.
Some data types that you should take into consideration for organization on a central repository are as follows:
- Key customer files: contracts, agreements, contact information, proposals
- User login data: profiles, UNIX .dot files, Config.sys files, Autoexec.bat files
- Network infrastructure files: DNS, WINS, DHCP, router tables
- User directories
- Application data: databases, Web site files
- Security configuration files: ACLs, firewall rules, IDS configuration files, UNIX password/shadow files, Microsoft Windows SAM database, VPN configuration files, RADIUS configuration files
- Messaging files: key configuration files, user mailboxes, system accounts
- Engineer files: source code, release engineering code
- Financial and company files: general ledger, insurance policies, accounts payable and accounts receivable files, incorporation registration, employee resource planning (ERP) data
- License files for applications
Aside from the data itself, your company needs to have an up-to-date hardware and software asset inventory list on hand at all times. The hardware list should include the equipment make, model, and serial number, and a description of what each particular piece of equipment is being used for. The software inventory should be similar, with the vendor name, version number, patch number, license information, and what the software is being used for. The information for each piece of equipment and software on the list should be mapped to the corresponding devices on the company network map. Be sure to include all cables and connectors, as well as peripheral devices such as printers, fax machines, and scanners.
You might want to submit the asset inventory list to your insurance company once a year.
Restoration and recovery procedures
Imagine that a disaster has occurred. You have the data, now what should you do with it? If you don’t have any restoration and recovery procedures, your data won’t be nearly as useful to you. With the data in hand, you need to be able to re-create your entire business from brand-new systems. You’re going to need procedures for rebuilding systems and networks. System recovery and restoration procedures are typically best written by the people that currently administer and maintain the systems. Each system should have recovery procedures that indicate which versions of software and patches should be installed on which types of hardware platforms. It's also important to indicate which configuration files should be restored into which directories. A good procedure will include low-level file execution instructions, such as what commands to type and in what order to type them.
Document decision-making processes
Recovering your data, systems, and networks is one thing, but when you lose staff, recovering the knowledge they held is quite different. You will never be able to recover that knowledge completely. However, you can mitigate this loss by documenting decision-making processes in flowcharts. To do this, have each of your staff identify decisions that they make and then create flowcharts for their thought processes. Sample decisions could be:
- How much do you charge for a new service?
- How do you know if taking on a particular new project is worth the return?
- How do you evaluate new business?
- How do you decide whom you should partner with?
- How do you decide who your sales prospects are?
- How do you decide who your suppliers are?
- When a call comes in to your help desk, how does it get routed?
- What are your QA procedures for your product?
It's impossible to document every decision your staff is capable of making. To get started, don't ask your staff to document every possible decision-making scenario. Ask them to document the three most important decision-making processes that they use on a consistent basis. You can add new processes to your disaster recovery plan in the future, and you may want to have employees write three new decision-making flowcharts each year at the time of their annual reviews.
Backups are key
As an IT or network administrator, you need to bring all your key data, processes, and procedures together through a backup system that is reliable and easy to replicate. Your IT director's most important job is to ensure that all systems are being backed up on a reliable schedule. This process, though it seems obvious, is often not realized. Assigning backup responsibilities to an administrator is not enough. The IT department needs to have a written schedule that describes which systems get backed up when and whether the backups are full or incremental. You also need to have the backup process fully documented. Finally, test your backup process to make sure it works. Can you restore lost databases? Can you restore lost source code? Can you restore key system files?
Finally, you need to store your backup media off-site, preferably in a location at least 50 miles from your present office. Numerous off-site storage vendors offer safe media storage. Iron Mountain is one example. Even if you’re using an off-site storage vendor, it doesn't hurt to send your weekly backup media to another one of your field offices, if you have one.
Let’s say for a moment that the worst occurs and your business is devastated by a disaster, to the point where you need to rebuild your business from scratch. Here are some of the key steps you should take to recover your operations:
- Notify your insurance company immediately.
- Identify a recovery site where you will bring your business back up.
- Obtain your asset inventory list and reorder all lost items.
- Distribute a network map and asset inventory list to your recovery team.
- As the new hardware comes in, have your recovery team connect the pieces.
- Restore your network infrastructure servers first (DNS, routers, etc.).
- Restore your application servers second.
- Restore your user data third.
- Perform any necessary configuration tweaks according to your guidelines.
- Test all applications for functionality.
- Test all user logins.
- Put a notice on your Web site stating that your business was affected by a disaster.
It’s likely that in the event of a real disaster, not everything will be recoverable. Your goal should be to recover enough data, processes, and procedures so that your business can be up and running as quickly as possible, once you’re in a new office.
Testing your plan is key to ensuring its success. A good way to test your plan is in a lab setting. With uninstalled systems that aren’t connected to the network, see how fast you can install your systems, configure them, and restore essential data. The best test is to use a recovery staff other than the everyday staff that uses and administers the systems. By using staff that aren’t familiar with everyday usage of your systems and applications, you’ll uncover deficiencies in the processes and procedures you’ve documented. Time your recovery scenario and see if you can improve the time it takes for recovery each time you hold a practice drill.
A disaster recovery plan is essential to your company’s long-term success. Even if you never have to use the plan, the process of putting it together will by its very nature increase the security of your assets and improve your overall business efficiency. The preparation of a disaster recovery plan will teach you what data is important and will necessitate that you understand how your business works from a decision-making standpoint. Disaster recovery can be more easily achieved if you follow this simple outline:
- Hold someone accountable for disaster recovery.
- Identify mission-critical data.
- Organize data on a central repository.
- Create procedures for recovering mission-critical servers.
- Create knowledge-based decision-making flowcharts.
- Back up your data on a regular schedule.
- Store your data off-site.
- Test your recovery plan.