Storage

Learn how SAN fibre mesh works

TechRepublic member Donovan Colbert provides newbies a high-level overview of how a SAN fire mesh logically works.

SAN solutions are ideal platforms for high-availability solutions; however, because of how much the solutions cost, they have a fairly narrow range of application.

SAN misconfiguration is a terrible waste of data center and IT budget resources. I often use the analogy that drive space on the corporate desktop is the equivalent of rural land and housing -- it's sprawling, huge, and relatively inexpensive. If that is true, then SAN storage is housing in the heart of Downtown Tokyo. It is, megabyte to megabyte, some of the most expensive real estate in your entire data center.

If you know how to configure and maintain your SAN, you'll be more confident in your ability to quickly and transparently keep users' experience optimized and reliable at peak levels -- possibly more than any other solution deployed in your data center.

Introduction

This column is a high-level overview of how a SAN fibre mesh logically works. When reading the column, keep in mind that most SAN solutions will be engineered around this basic design, although I don't go into specific details, and your SAN solution may not adhere to everything I discuss. Also, I'll use terminology commonly applied to EMC SAN solutions; other vendors may have different names for the various components, but the general concepts should be the same.

People often think of a SAN as simply a large, fast, external hard drive. This is accurate enough at the most basic level of understanding, but a SAN is much more than this too -- and using a SAN simply for extremely fast direct attached storage (DAS, or sometimes called SCSI shared disk) is only scratching at the surface potential of SAN devices. In fact, SAN devices really shine as components of a comprehensive high-availability solution. An in-depth discussion of those advantages and how to leverage them is beyond the scope of this column, but knowing this will help you understand that the design of a SAN, from top to bottom, is one of the most highly redundant devices you'll encounter in any modern data center.

Redundancy and fibre switches

This redundancy begins at either termination point. A termination point is either the Drive Array Enclosure (and the drives inside that DAE) or the connected server with a Fibre Host Bus Adaptor (HBA). We'll start from the connected, or target, server.

A target server is the server that is connected to the SAN by an HBA. The HBA is a regular HBA adaptor (it is pretty much like a SATA, EDI, or SCSI PCI host bus adaptor) except it has a Fibre connector instead of a traditional, multi-pin, or edge copper connector. Another important distinction is that the HBA card generally has two ports (HBA0 and HBA1), or there are two single port HBA cards installed (again, HBA0 and HBA1). There may be even more, but for our purposes, we'll assume that there are two HBA cards, 0 and 1, installed in our target server. A fibre cable is run from these HBAs to the next components in the SAN architecture: the fibre switches.

Fibre switches are switches that support fibre instead of copper cable. For the sake of redundancy, there will be two unique switches per SAN; for this example, we'll call these FC01 and FC02. The fibre that runs from HBA0 on the target server will go into port 1 on FC01; the fibre that runs from HBA1 will go to FC02. If either HBA, either cable, or either switch goes out, a redundant path remains to the SAN.

The next step in our data path is at the heart of the SAN itself. SANs generally have two, redundant storage processors: Storage Processor A (SP-A) and Storage Processor B (SP-B). Each Storage Processor has two, redundant fibre connectors: FE0 and FE1. To help clarify, think of SP-A FE0 and FE1 (each an individual connector for a fibre cable) and SP-B FE0 and FE1 (also each an individual connector for a fibre cable). Each storage processor will have 2 fibre connectors, with 4 cables total attached.

I think it may be clearer if I reverse the way we look at the cable drop on this segment between these components. The 4 fibre drops mentioned above go to the fibre switches, FC01 and FC02. You can also consult the diagram in Figure A. Figure A

SP-A FE0 leads to a port on the switch FC02, and SP-A FE1 leads to a port on the switch FC01 (these are switched on purpose).

SP-B FE0 leads to a port on the switch FC01, and SP-B FE1 leads to a port on the switch FC02.

HBA0 on the target server goes to port 1 on FC01, and HBA1 goes to port 1 one FC02.

By switching (basically crossing over), the paths from the two ports on SP-A and the two ports on SB-B to the switches, and having each HBA on the target server directed to a dedicated switch, you have a very high level of redundancy built into this design, which you'll often hear referred to as a "mesh" by SAN professionals. In this example, a port on each service processor (SP-A and SP-B) could fail, a switch could fail, and an HBA could fail, and you would still retain a path to the SAN from the target server. This is pretty impressive redundancy, and it's one of the reasons why SAN storage is so popular as a component of a high-availability solution.

This redundancy does not end at the two storage processors. The storage processors connect by redundant backplanes (usually consisting of high speed 2 or 4 gigabit copper) that continue the mesh to the individual DAEs that are populated in turn with high speed, highly reliable SCSI drives. In turn, the drives in the DAEs can be grouped in various configurations at different common RAID levels.

Now that everything is hooked up, you need to carve up some of your drives into a LUN and RAID, assign the LUN to the target server, configure the server, format the drives, and you're in business. In fact, it isn't uncommon to encounter a SAN that is configured this way. But, if you did this, all traffic across your switches would be broadcast traffic, and, if you've got a lot of target servers, a lot of LUNs, and a lot of traffic, this could have a significant impact on your overall throughput.

One reason organizations spend a boatload of money on a SAN is for the speed. Unfortunately, the solution is one of the more complex aspects of SAN configuration, which may be why so many people either skip it entirely or do it wrong. Zoning is done at the switch; you basically define paths or routes from an HBA through the switch and to the Storage Processor.

WWNs

Everything I discussed so far is intended to facilitate two things: being highly redundant and being relatively easy to zone. You would now connect to your switch and log in. The HBAs installed in the server have a unique port address called World Wide Names (WWNs); they look like really long MAC addresses, which is effectively what they are.

There are utilities that will tell you what WWN you are working with, but a simple and effective shortcut is to know which port on the switch runs to which port on the HBA card. This just makes common sense. I recommend labeling each end of the cable, even on short runs. Once you know the port on the switch, you can access the switch itself, find that port, and find the WWN.

If you inherit a SAN and the cable management wasn't well done, you'll probably need a utility to determine what WWN is hooked up to what port on your switch. It is much easier to plan carefully ahead of time and clearly label things; this is why we've put each HBA into Port 1 on each respective switch. It is fairly simple to open up our switch, trace port 1 back, and determine which HBA WWN is connected to it.

Zoning

Zoning is a matter of creating an alias name and defining the correct path from an HBA port through the switch and to the DAEs. An important concept to keep in mind here is the redundancy. You've got two switches, two HBA ports, and two storage processors with 4 paths to the switches. You're going to have to log into each switch individually and create a unique Alias for each HBA port to each port on both service processors.

Generally speaking, zoning requires three steps: (1) making an Alias and adding the correct WWN to it, (2) creating a zone and adding the correct Alias to it, and (3) making these part of the zone configuration and making them active. You'll do this two times per switch -- once for HBA0 and once for HBA1 on Storage Processor A, and once for HBA0 and once for HBA1 on Storage Processor B.

When performing the zoning, remember that FE0 and FE1 are reversed on Storage Processor A and Storage Processor B. Conceptually, for each server, you should understand that you'll log into your first switch, create an alias, create a zone, and add the alias to the zone once for HBA0 and again for HBA1. You'll repeat the entire process on the second switch, but FE0 and FE1 will be reversed on this switch. Once the steps are completed, all traffic between our target server and the SAN is routed and not broadcast, cutting down tremendously on unnecessary chatter on our SAN.

EMC recommends a pretty strict naming convention for creating alias names. The recommended naming scheme is "Servername_HBA#_SANname_Storage Processor_Port". For example, mysrv01_hba0_mysan01_SPA_FE0 is the kind of alias name you would expect to see. The benefit of this naming convention is two-fold: It is very descriptive of what you are working with, and anyone else familiar with this naming convention will quickly be able to determine what is going on with a particular alias.

Conclusion

Understanding these basic concepts will give you a tremendous head start in learning how to properly administer, manage, and deploy SAN solutions; however, there is still no substitute for attending official SAN training provided by your SAN vendor. If your organization is considering deploying a SAN solution, for the amount of money it will invest in even an entry-level SAN, it pays incredible dividends to bundle training into the purchase price.

TechRepublic's Servers and Storage newsletter, delivered on Monday and Wednesday, offers tips that will help you manage and optimize your data center. Automatically sign up today!

About

Donovan Colbert has over 16 years of experience in the IT Industry. He's worked in help-desk, enterprise software support, systems administration and engineering, IT management, and is a regular contributor for TechRepublic. Currently, his profession...

8 comments
george.hwa
george.hwa

Very helpful article. A detailed illustration or specific listing for the zoning part would be great to solidify the concept.

deepunix
deepunix

Thats really a grate Article i read about SAN. Thank you very much TechRepublic. "DPL"

The 'G-Man.'
The 'G-Man.'

HA systems would have a redundant server involved for fallover as well. Thus keeping the workload running.

kmdennis
kmdennis

This is a very nice and simplified article easy to understand. Also, I do not believe any company that would get this kind of setup would not have a redundant server in the process. It would just not make any sense. It would be kind of like buying a Rolls Royce and putting just basic Insurance. You would want to cover everything with a complete comprehensive insurance. Similar you would cover all base with a failover server. It is a great article!

dcolbert
dcolbert

For the sake of this article, I wanted to focus on the SAN itself. Getting into how a HA solution would work with the SAN described in this article would have been outside of the scope of what I was trying to cover. I was thinking about doing a series, though. :) And it is worth mentioning that even without clustering or other high availability solutions, if you lose the server, it is pretty easy to build another server, attach it to the lun, and you're back up with minimal risk of data loss. "Poor man's" failover. :)

don.howard
don.howard

If absolute HA is not needed, it is relatively easy to replace a server in a couple of hours. Contrast this with the many, many hours required to restore a large system from backup, localize the database, ensure permissions, etc. This is where SAN really shines from a recovery standpoint.

The 'G-Man.'
The 'G-Man.'

Have the images / data on the lun(s) - different lun for each so the server is just the processing unit.

dcolbert
dcolbert

I think we've touched on something here... You've got to think about what your goals are with a SAN. For us, DB performance is a major driver. For some, maximizing storage space is the main driver. For others, high availability, clustering and disaster recovery are what motivate SAN investments. Now, the nice thing about a SAN is that once you've made the initial investment, for *any* of these reasons, the other benefits are relatively painless to leverage if you find the budget or need to do so - and most of those benefits are *worth* leveraging if you can do so. In my current environment, I have two relatively expensive mid-tier SAN solutions - and we've configured these with a focus on performance and reliability - oriented almost exclusively to Database solutions. We do not offer an SLA that requires automatic hot failover - it is something we certainly would eventually like to offer, but at this point, the idea that a server failure could easily be resolved by swapping out a new server and attaching it to the appropriate LUN is a good solution for us. Instead, the speed and redundancy of the LAN is what makes it work so well for us.

Editor's Picks