SAN solutions are ideal platforms for high-availability solutions; however, because of how much the solutions cost, they have a fairly narrow range of application.
SAN misconfiguration is a terrible waste of data center and IT budget resources. I often use the analogy that drive space on the corporate desktop is the equivalent of rural land and housing -- it's sprawling, huge, and relatively inexpensive. If that is true, then SAN storage is housing in the heart of Downtown Tokyo. It is, megabyte to megabyte, some of the most expensive real estate in your entire data center.
If you know how to configure and maintain your SAN, you'll be more confident in your ability to quickly and transparently keep users' experience optimized and reliable at peak levels -- possibly more than any other solution deployed in your data center.
This column is a high-level overview of how a SAN fibre mesh logically works. When reading the column, keep in mind that most SAN solutions will be engineered around this basic design, although I don't go into specific details, and your SAN solution may not adhere to everything I discuss. Also, I'll use terminology commonly applied to EMC SAN solutions; other vendors may have different names for the various components, but the general concepts should be the same.
People often think of a SAN as simply a large, fast, external hard drive. This is accurate enough at the most basic level of understanding, but a SAN is much more than this too -- and using a SAN simply for extremely fast direct attached storage (DAS, or sometimes called SCSI shared disk) is only scratching at the surface potential of SAN devices. In fact, SAN devices really shine as components of a comprehensive high-availability solution. An in-depth discussion of those advantages and how to leverage them is beyond the scope of this column, but knowing this will help you understand that the design of a SAN, from top to bottom, is one of the most highly redundant devices you'll encounter in any modern data center.
Redundancy and fibre switches
This redundancy begins at either termination point. A termination point is either the Drive Array Enclosure (and the drives inside that DAE) or the connected server with a Fibre Host Bus Adaptor (HBA). We'll start from the connected, or target, server.
A target server is the server that is connected to the SAN by an HBA. The HBA is a regular HBA adaptor (it is pretty much like a SATA, EDI, or SCSI PCI host bus adaptor) except it has a Fibre connector instead of a traditional, multi-pin, or edge copper connector. Another important distinction is that the HBA card generally has two ports (HBA0 and HBA1), or there are two single port HBA cards installed (again, HBA0 and HBA1). There may be even more, but for our purposes, we'll assume that there are two HBA cards, 0 and 1, installed in our target server. A fibre cable is run from these HBAs to the next components in the SAN architecture: the fibre switches.
Fibre switches are switches that support fibre instead of copper cable. For the sake of redundancy, there will be two unique switches per SAN; for this example, we'll call these FC01 and FC02. The fibre that runs from HBA0 on the target server will go into port 1 on FC01; the fibre that runs from HBA1 will go to FC02. If either HBA, either cable, or either switch goes out, a redundant path remains to the SAN.
The next step in our data path is at the heart of the SAN itself. SANs generally have two, redundant storage processors: Storage Processor A (SP-A) and Storage Processor B (SP-B). Each Storage Processor has two, redundant fibre connectors: FE0 and FE1. To help clarify, think of SP-A FE0 and FE1 (each an individual connector for a fibre cable) and SP-B FE0 and FE1 (also each an individual connector for a fibre cable). Each storage processor will have 2 fibre connectors, with 4 cables total attached.I think it may be clearer if I reverse the way we look at the cable drop on this segment between these components. The 4 fibre drops mentioned above go to the fibre switches, FC01 and FC02. You can also consult the diagram in Figure A. Figure A
SP-A FE0 leads to a port on the switch FC02, and SP-A FE1 leads to a port on the switch FC01 (these are switched on purpose).
SP-B FE0 leads to a port on the switch FC01, and SP-B FE1 leads to a port on the switch FC02.
HBA0 on the target server goes to port 1 on FC01, and HBA1 goes to port 1 one FC02.By switching (basically crossing over), the paths from the two ports on SP-A and the two ports on SB-B to the switches, and having each HBA on the target server directed to a dedicated switch, you have a very high level of redundancy built into this design, which you'll often hear referred to as a "mesh" by SAN professionals. In this example, a port on each service processor (SP-A and SP-B) could fail, a switch could fail, and an HBA could fail, and you would still retain a path to the SAN from the target server. This is pretty impressive redundancy, and it's one of the reasons why SAN storage is so popular as a component of a high-availability solution.
This redundancy does not end at the two storage processors. The storage processors connect by redundant backplanes (usually consisting of high speed 2 or 4 gigabit copper) that continue the mesh to the individual DAEs that are populated in turn with high speed, highly reliable SCSI drives. In turn, the drives in the DAEs can be grouped in various configurations at different common RAID levels.
Now that everything is hooked up, you need to carve up some of your drives into a LUN and RAID, assign the LUN to the target server, configure the server, format the drives, and you're in business. In fact, it isn't uncommon to encounter a SAN that is configured this way. But, if you did this, all traffic across your switches would be broadcast traffic, and, if you've got a lot of target servers, a lot of LUNs, and a lot of traffic, this could have a significant impact on your overall throughput.
One reason organizations spend a boatload of money on a SAN is for the speed. Unfortunately, the solution is one of the more complex aspects of SAN configuration, which may be why so many people either skip it entirely or do it wrong. Zoning is done at the switch; you basically define paths or routes from an HBA through the switch and to the Storage Processor.
Everything I discussed so far is intended to facilitate two things: being highly redundant and being relatively easy to zone. You would now connect to your switch and log in. The HBAs installed in the server have a unique port address called World Wide Names (WWNs); they look like really long MAC addresses, which is effectively what they are.
There are utilities that will tell you what WWN you are working with, but a simple and effective shortcut is to know which port on the switch runs to which port on the HBA card. This just makes common sense. I recommend labeling each end of the cable, even on short runs. Once you know the port on the switch, you can access the switch itself, find that port, and find the WWN.
If you inherit a SAN and the cable management wasn't well done, you'll probably need a utility to determine what WWN is hooked up to what port on your switch. It is much easier to plan carefully ahead of time and clearly label things; this is why we've put each HBA into Port 1 on each respective switch. It is fairly simple to open up our switch, trace port 1 back, and determine which HBA WWN is connected to it.
Zoning is a matter of creating an alias name and defining the correct path from an HBA port through the switch and to the DAEs. An important concept to keep in mind here is the redundancy. You've got two switches, two HBA ports, and two storage processors with 4 paths to the switches. You're going to have to log into each switch individually and create a unique Alias for each HBA port to each port on both service processors.
Generally speaking, zoning requires three steps: (1) making an Alias and adding the correct WWN to it, (2) creating a zone and adding the correct Alias to it, and (3) making these part of the zone configuration and making them active. You'll do this two times per switch -- once for HBA0 and once for HBA1 on Storage Processor A, and once for HBA0 and once for HBA1 on Storage Processor B.
When performing the zoning, remember that FE0 and FE1 are reversed on Storage Processor A and Storage Processor B. Conceptually, for each server, you should understand that you'll log into your first switch, create an alias, create a zone, and add the alias to the zone once for HBA0 and again for HBA1. You'll repeat the entire process on the second switch, but FE0 and FE1 will be reversed on this switch. Once the steps are completed, all traffic between our target server and the SAN is routed and not broadcast, cutting down tremendously on unnecessary chatter on our SAN.
EMC recommends a pretty strict naming convention for creating alias names. The recommended naming scheme is "Servername_HBA#_SANname_Storage Processor_Port". For example, mysrv01_hba0_mysan01_SPA_FE0 is the kind of alias name you would expect to see. The benefit of this naming convention is two-fold: It is very descriptive of what you are working with, and anyone else familiar with this naming convention will quickly be able to determine what is going on with a particular alias.
Understanding these basic concepts will give you a tremendous head start in learning how to properly administer, manage, and deploy SAN solutions; however, there is still no substitute for attending official SAN training provided by your SAN vendor. If your organization is considering deploying a SAN solution, for the amount of money it will invest in even an entry-level SAN, it pays incredible dividends to bundle training into the purchase price.
TechRepublic's Servers and Storage newsletter, delivered on Monday and Wednesday, offers tips that will help you manage and optimize your data center. Automatically sign up today!
Donovan Colbert has over 16 years of experience in the IT Industry. He's worked in help-desk, enterprise software support, systems administration and engineering, IT management, and is a regular contributor for TechRepublic. Currently, his professional role is as a Linux support engineer for a fast-growing Linux/FOSS consultancy group. You can follow him @dcolbert on Twitter or his personal blog, located at http://donovancolbert.blogspot.com.