EMC's AX4 failover features work as advertised, much to the delight of Scott Lowe. Sometimes, it's nice to celebrate the small successes!
For a few months now, I've written blog postings regarding my experience with EMC's AX4 entry-level iSCSI SAN. Westminster College purchased two AX4 arrays - a SAS unit with dual redundant controllers and an attached SATA unit - for a total of 13.8TB raw capacity. Since installation, we have slowly begun moving applications and services to the SAN and away from direct-attached storage. We now run our primary file server, Exchange 2007 mailbox stores, and, as of last week, three administrative database applications from the SAN as well as a number of VMware virtual machines.
Last week, while we were moving our administrative databases to the SAN, we experienced a problem with one of the controllers. Although it was still passing traffic, it was showing up in the array management software in an "unmanaged" state. Since we were actively performing configuration tasks for our database move, I wanted to make sure that this issue got resolved, so I contacted EMC support.
During the support session, the analyst asked if he could reboot the controller. I asked him to tell me what the impact of this operation would be and he indicated that any connections to this controller would be severed. I knew for a fact that some of our iSCSI targets were using the controller in question. However, everything we have connected to the SAN is configured for high availability. In theory, a connection that is severed for any reason will reestablish itself through the second controller. As we were at a safe point in the day, I told the customer support rep to have at it.
Good news on two fronts:
- The reboot corrected the problem.
- All of the connected servers failed over to the second controller with no noticeable interruption in service. In fact, I was remoted in to one of the connected servers when the first controller was rebooted. As soon as the first controller went down, EMC's PowerPath software popped open and told me that redundancy to the controller had been lost and that traffic was moving through controller B now. A beautiful thing, indeed!
Now, some of you will say, "Why so happy? Isn't that what you expected?" Well, yes. But, sometimes, it's just nice to see something work the way that it's supposed to without any problems! We spent a lot of time planning the architecture to get to this point and this was a great, safe way to test our work in real life.