General discussion


Solve this disaster scenario

By editor's response ·
Take a look at the scenario below and offer your best solution. The most innovative answers will be noted in the Disaster Recovery e-newsletter on May 6, 2003, along with Mike Talon's advice on this particular topic.

A client recently decided to protect a Microsoft Exchange 2000 system by using a real-time, software-based replication system. He procured a second, identical server with the same amount of storage space, and even added 50 percent more disk to accommodate growth.

Finally, to keep the system humming, he installed disk management software to automatically defrag and maintain the assorted volumes. HA systems were set up at the DR site and tested to ensure that the backup systems would take over within fifteen minutes if theproduction systems fail.

What can go wrong with this solution? Can you spot the major flaws?

If you haven't subscribed to our free DR e-newsletter, be sure to do so today!

This conversation is currently closed to new comments.

Thread display: Collapse - | Expand +

All Comments

Collapse -

Disk Mgmt Software

by LordInfidel In reply to Solve this disaster scena ...

Running disk management software to defrag drives that exchange is on is all sorts of bad.

Exchange has it's own built-in utilities for cleaning/releasing unused space.

In short, defragging an exchange based system, can and will lead to data corruption. Don't do it.

Collapse -

Defragging Exchange

by clearsmashdrop In reply to Disk Mgmt Software

I second Lord Infidel's assertion you should not run a defrag on Exchange. I learned the hard way that is not a smart thing to do. I ended up corrupting the database. Ooops.

Collapse -

15 mins?

by old and tired.. In reply to Solve this disaster scena ...

if it's truly mission critical, 15 mins is a lifetime...we should be thinking REDUNDANT systems which kick in right away!
defrag/maintain is EXCHANGE business today...let it do it's job ... external software can and will cause untold headaches, if not failure(s)
if there are two systems, don't see the point in the "added 50% more disk for growth" in the second system...instead of all this complexity..stick with MIRRORING

Collapse -

We're talking Windows here

by MikeTalonNYC In reply to 15 mins?

Exchange alone can take 15 minutes to come up after a failover, so that's a tight HA solution if the client can promise 15 minutes. Unfortunately there are no SAFE instant failover tools for most Windows-based applications.

The 50% overhead isjust a good idea in many cases, just in case.

Part 2 is on the right track, but don't stop there, the solution requires a bit more thought along those lines.

Mike Talon

Collapse -

Everything and nothing continued...

by Pritesh_Mehta In reply to Solve this disaster scena ...

truncated message continued.....

Taking into account what we know, ensuring that there are no filesytem antivurus scanners (pure exchange mailbox ones are okay) because they leave hooks in the transaction logs, which give you a partially attacheddatabase on the target side. Async replication, which all software replication is (that I have ever used), presents a real probability in data loss if there is a source machine failure. The transactions can be confirmed at the target end and possible database corruption can occur. The busier the system, the more likely there will be data loss.

...The other points about defragging exchange are quite correct - don't do it! Also the 15-minute failover can be reduced if you have auto failure detection in the replication software. This as well can cause problems with split-brain syndrome. The target site assuming a failure has occurred (incorrectly due to a network glitch) and swapping active machine names, where the source machine is actually still okay, when the network comes back - you have 2 machines with the same id and hey presto - lots of problems!!

Collapse -

Evrything and Nothing

by Pritesh_Mehta In reply to Everything and nothing co ...

Ok let me explain the title....
Nothing is WRONG with trying to replicate Exchange 2000, but you have to be damn sure you follow the following steps. I am a Consultant working for a Storage Reseller and solutions provider. I specialise in DR and replication as well as backup and SAN etc..

As Mike quite rightly said there are massive costs involved with Hardware replication, you need a 100MB link to keep the in order write sequence at the target end confirming its I/O write before the Source end can do some more work.. That said something like Legato Co-Standby or their new AAM product works magnificently for this. You also need to prepare the disks at the source and target end, that may require (usually) a re-install of the application in question.. lots of unwanted downtime ahead...

To combat this Software replication for about 5 grand per pair is considerably cheaper to install and configure, requires no reinstallation of application and can be done in a few days (with testing of course!!!). Software replication can be throttled and scheduled to only update the target periodically.

Again Mike was correct in saying that async or near sync replication can cause corruption at the target end if the source end fails unexpectedly (usually a failure, not a clean shutdown).

Collapse -

one note

by MikeTalonNYC In reply to Evrything and Nothing

If memory serves me well (sorry, just finished watching the Tivo of King of Iron Chefs), I mentioned that data corruption could occur in improperly confiugred Asyc systems. Most of these solutions offer write order preservation and other protectionmethodologies to prevent corruption even in unexpected shutdowns once they are properly configured.

Don't want the vendors looking to kill me =)

Mike Talon

Collapse -

Keep going

by MikeTalonNYC In reply to Everything and nothing co ...

Modern software-based replication solutions can protect data even during source failure, real-time asychronous data-transfer solutions have come a long way in just the last 18 months or so.

However, you are still just about nailing the major issue addresed in the solution explanation I'll present next week. Keep searching!

Mike Talon

Collapse -

Here's a better Disater Senario

by HAL 9000 Moderator In reply to Solve this disaster scena ...

This is a real one so How about running with this.

A server that is less than 2 weeks old fails and all the computers in the business also fail. When the computer is returned to the maker the customer is told that it has suffered a lighting strike and is not covered by the manfactures warantie and that it should be covered by their insurance company. Upon an Insurance Claim being loged the insurance company requires an independant assesment of the computer and sends it off to their nominatedrepairer for a Inspection Report.

The company who recieves the computer for the Insurance report first opens the case to find a mess of chared circutiry and carbon lying on the bottom of the case all consistent with a severe power surge or lighting strike. But out of courisiorty puts a power supply tester on the main laed and applies power not expecting anything to happen and the tester imediately blows up. Upon closer inspection mains voltage is found coming out of the 12 volt and 5 volt supply as well as the 3 volt lines. It is imeaditly apparent that mains voltage has caused all the damage and upon inspection the repaired finds an unbranded 250 Watt power supply trying to drive a Dual Xeon Process M'Board with 4 IDE devices and 8 SCSI devices connected to the built in SCSI controller on the M'Board.

The repairer them proceeds to the business to assess the rest of the damage and finds everything has suffered the same fate from the humble hubs to every computer on the network all 250 machines.

Now here's the 2 questions

1 What would you do go along with the lie knowing if found out you would be charged with fraud? Report the truth and then expect to spend a long time in court as a wittness for the business to recover costs both real and punitive?

2 How would you go about fixing this situation if you told the truth?

Collapse -

Is this real?

by MikeTalonNYC In reply to Here's a better Disater S ...

Did this happen to your company? Under the circumstances I'd suggest the first thing you'd want to do is contact a qualified attourney.

Meanwhile, if you'd like this to be a case study for a future scenario in this column, let's talk!

Mike Talon

Related Discussions

Related Forums