Greetings all,

My company is attempting to perform replication from one HP EVA SAN array to another HP EVA SAN array across the WAN. We have a metro Ethernet connection between the two with one Gigabit of shared bandwidth. We share the bandwidth with our other business units, with no QoS in place, but we have been told that the pipe has never been completely saturated, and we?re not rate limited. The SAN arrays are on 4Gbps fiber channel brocade switches. There are two devices called MPX110?s that send the data from fiber channel to Ethernet. Each MPX has redundancy groups they perform replication for, and although they have two Ethernet and two fiber channel ports on each, we only use one on each. Each MPX110 has a path they perform replication for to their counter parts on the other side. It is my understanding they negotiate a tunnel between them, Fiber Channel over IP. They?re each on their own 6509 which have a uplinks to a 3750 and that goes across the metro Ethernet to a 3560 on the other side, then up to a 3560 acting as the core and out to two 3560?s with an MPX on each one.

Now the problem, although we have one gigabit of bandwidth, they?ll only use about 13Mbps of it each, we?ve verified this with iperf. Each connection we?ll only take 13Mbps of bandwidth, parallel tests show each connection gets 13Mbps of bandwidth. The HP engineer told us that at >5Mbps we get approximately 1.3Mbps of actually data, which means that FCIP has 80% over head? Can that be right? The big huge problem is that after running for several hours they?ll eventually just die and have to rebooted to start replicating again. They?re already on the latest firmware ( The only error we get from the statistic screen of the MPX?s says they?re getting TCP timeouts.

I?ve performed captures on both sides? MPXs? and the errors I see in a 60 sec sample are FCP malformed packets (~4300), duplicate ACK?s (~41), previous segment lost (~3), fast retransmission (~3). When HP was questioned about the FCP malformed packets they stated that they use a proprietary protocol and that wireshark wouldn?t be able to decode it. I?ve since searched for this protocol but can find no references to it anywhere. The other errors seem so minor and few it would be hard to believe that they?re impacting the data stream that much if at all.

I?ll include a small sample of the captures, if it lets me.

Thanks in advance for your assistance.

The Tech Dictator

by TechDictator In reply to FCIP issues with SAN repl ...

I discovered if I disable the FCP decode, Wireshark does decode it correctly as FCIP.

We applied a QoS config to flag SAN replication traffic as DSCP EF and have seen consistent ping times of ~36ms between sites and the bandwidth climb as high as 45Mbps on a 1Gbps link. They still fail after replicating for a few hours. Last time we watched them replicate for 12 hours and then fail. The TCP timer exceed counter seems to indicate that is the problem, but I have nothing significant on the wireshark captures to support this.

HP has decided that the MPX110 on the far side needs to be replaced. I'll post an update after that's done.

