In the first part of this series, I described my environment at a small, liberal arts college and our current storage dilemma. I also explained what I consider important when it comes to making an appropriate storage decision. In part two of this series, I will expand on my priority list and explain what I'm trying to achieve.
Block-level shared storage that easily connects to servers
When it comes to shared storage in an enterprise environment, there are two choices: Network-attached storage (NAS) and block-level shared storage in the form of a storage area network (SAN). A NAS device, which generally provides file-level access to its contents, is not what I am seeking for this particular project. Instead, I am looking for a device that provides block-level storage similar to what local disks provide. In fact, with volumes housed on a SAN, the storage behaves as if it is locally attached. With the exception of initial volume creation, SAN-based volumes are managed using OS-specific tools. For example, although the volume space is dedicated on the SAN using the SAN's management tools, under Windows, formatting the volume and other management tasks are done through Disk Management. Think of the initial SAN-based volume space as an empty disk partition.
While NAS' file-level sharing is useful in some instances, such as when you intend to use a NAS device as a file server, a SAN's block-level storage provides a number of benefits, including:
- Very high availability.
- Ability to be used with a wider variety of applications.
- Higher levels of performance.
- Better overall use of the available storage space.
There are three block-level SAN technologies available on the market, including Fibre Channel, iSCSI, and AOE, or ATA over Ethernet. Unfortunately, AOE has not yet really taken off in the marketplace, leaving Fibre Channel and iSCSI as the two main contenders for my project.
100 percent redundant
Let’s face the brutal facts: Data storage is a pretty darn important component of a data center of any size. A whole lot of money is spent at the time of server purchase on things like RAID controllers and enough reliable disks to make up, in many cases, a RAID 5 array. After all, if you’re running in a non-redundant scenario, a single failed disk can ruin your whole day…as well as the days for a whole lot of people who can no longer access the affected services.
Why would anyone buy a SAN that did not live up to the reliability numbers provided by local storage?
In my search, I’m looking for a solution that is fully redundant and that can withstand the failure of any single component, including the infrastructure between the SAN and the servers. Any solution I consider must support dual controllers, have redundant power supplies, and run in a reasonable RAID configuration.
As I stated, I also plan to make sure the linking network infrastructure, be it Fibre Channel switches or gigabit Ethernet switches for iSCSI, is bullet-proof. Every server will get multiple connections to the storage, via separate switches, creating a mesh architecture that can withstand the loss of any NIC/Fibre Channel HBA, cable, or switch.
Like many of you, I'm operating on a pretty tight IT budget. So, I have to be pretty careful about what I buy. I probably won't be sitting with my CFO discussing the merits of EMC's Symmetrix product line. While that product line is absolutely appropriate in some environments, I just don’t need it. Heck, even something on the lower end of the EMC scale, such as a Clariion AX150 (with redundant controllers, of course), might work for me.
This is also a good time to talk about overall solution performance. I don’t need something that costs a ton of money just to squeeze out a few more IOPS (input/output operations per second). In fact, I’m still up in the air as to whether or not to go iSCSI, or go with lower-end Fibre Channel. On paper, of course, Fibre Channel, running at 2 Gbps and 4 Gbps, is far superior to iSCSI, which tops out at 1 Gbps, in terms of theoretical speed. However, in reality, the difference is really not that extraordinary when considering the type of traffic that I need to support. My solution needs to support:
- About 1,250 Exchange mailboxes. We’ll also be going to Exchange 2007’s Unified Messaging solution this summer, but this just becomes a part of the Exchange information store.
- A few SQL Server 2000 databases, including the ones that support our primary student administrative system, fundraising efforts, and help desk.
- A significant (for us) SharePoint 2007 data store, running on SQL Server 2005.
- Some virtual machines running atop VMware ESX 3, which now supports a number of iSCSI SANs.
- File storage. We will be moving our files away from our existing NAS device.
Just about any reasonable iSCSI or low-end Fibre Channel SAN can support these needs with one controller tied behind its backplane. Any potential bottlenecks are not likely to be the storage connection, even if that connection is running at "only" 1 Gbps.
Snapshots can be a life-saver. A snapshot is exactly what it sounds like—a point-in-time "picture" of a volume of SAN-housed data. In my previous position, snapshots saved our behinds when someone ran a query against a SQL database and corrupted it. We were able to restore the database back from a snapshot that was only 10 minutes old, thus preserving the hard work that had been done all day long working with the database.
While I would love a solution that has great snapshot capability, my main driving factors are product reliability and cost. However, at least some level of snapshotting is essential. Some vendors think this is still an optional feature and charge more for it.
Like snapshots, this is on my "would be nice if" list for this project, as we are working on a disaster recovery plan. With replication, I would be able to, in the future, add a second unit somewhere else on campus or at a facility in another location entirely, and replicate the contents from my primary array. If a disaster affected my data center, data would be protected and fairly up-to-date. Ideally, my disaster recovery plan would include replication ability plus virtual machines housed on the SAN and running VMware’s Vmotion tool with just a couple of hot-standby servers at the remote location.
Now you know exactly what I’m looking for and some of the reasoning behind it. In part three of this series, I’ll talk about how I narrowed the playing field and selected the solutions for my short list, and I will also present my short list.