If the building in which you work were completely destroyed tomorrow, how quickly could your company’s computer systems be back online? If you’re smart, you said that they would never go offline because a remote facility would instantly take over operations. Until fairly recently though, such functionality remained a pipe dream for all but the largest and wealthiest companies.
Individual companies plan for disaster in different ways, depending on the likelihood of a disaster, the amount of time that is acceptable for the company’s systems to be offline, and of course, budget. Some of the companies that I’ve worked for had some pretty unique plans for dealing with a major catastrophe.
An insurance company I used to work for had a warehouse located in a nearby city. They required their computer vendor to keep duplicate machines in stock at all times. The company was contractually obligated to deliver these duplicate machines to the warehouse within two hours if the main office were destroyed. The idea was that by using these duplicate machines and backup tapes that were already being sent to the warehouse each day, the company could be back online within eight hours of when the disaster occurred.
Another company that I worked for had a huge datacenter and an even bigger tape library. As a contingency against disaster, the company had a duplicate data center in another state that could take over operations immediately. The tape library was designed so that any time data was archived to tape, a duplicate tape was also made. These duplicate tapes were flown to the remote data center twice each day.
While both of these are relatively good disaster recovery plans, they tend to be expensive and far from perfect. In either case, the destruction of the main office would almost certainly lead to some data loss. Today, losing even a few hours worth of data isn’t an option for most companies. Imagine what would happen if you lost every e-mail that you had received in the last four hours, or what would happen if you had no record of any of the sales that had come in through your company’s Web site in the last four hours.
If even a few hours worth of downtime and data loss isn’t an option for your company, then you need to take a look at technologies that will allow your company to stay online, even if the main office were destroyed.
One technology that is promising to make instantaneous disaster recovery more readily available to smaller companies is iSCSI. In case you aren’t familiar with iSCSI, it stands for Internet Small Computer System Interface. It is basically SCSI over IP. The technology was originally designed for storage area networks, running at speeds of up to 10 Gbps over Ethernet. However, iSCSI is capable of running on any IP-based network, including the Internet.
iSCSI is an ideal technology for keeping your network online during a major disaster. For example, if your corporate headquarters is located in Miami, you could theoretically mirror all of your data to a server in Charlotte, NC in real time. The only major technical hurdle that you would have to cross is getting a connection with enough bandwidth.
Of course, the simple act of mirroring data to another facility alone won’t keep the company online during a disaster. All the mirroring process does is guarantee that there is a complete and current copy of all of the company’s data at another facility. The real trick to staying online is to configure the servers in the alternate facility to take over for the servers that have been destroyed.
There are a couple of different ways that you could pull this off. The simplest method would be to have duplicate servers at the alternate facility on standby. When disaster strikes, you could power the servers up, connect them to the mirrored data, and then change any necessary DNS entries to get traffic to start flowing to the alternate facility rather than to the primary facility.
Another technique that you could use is long distance clustering. In a long distance cluster configuration, one server acts as a primary node and another server acts as a secondary node. A data link between the two servers communicates the primary server’s health to the secondary server. If the primary server should drop offline and fail to respond within a predetermined amount of time, then the secondary server will take over for the primary server automatically. Because the two nodes act as a single server, there is no need to manually redirect traffic to the secondary server.
If iSCSI clusters are something that you believe your company could benefit from, then you will be happy to know that iSCSI works well with both Windows 2000 Server clusters and with Windows Server 2003 clusters. However, according to an article on TechNet, Microsoft plans to support iSCSI only in conjunction with Windows Server 2003 clusters.
If a company is considering implementing iSCSI clustering, then the biggest hurdle to overcome will likely be bandwidth. While iSCSI gets the job done on a reasonable budget, it can be very bandwidth intensive. Therefore, when planning an iSCSI cluster, you might want to budget for a high bandwidth leased line. The actual amount of bandwidth that you need varies widely from company to company, depending on the amount of data that’s being transmitted.