Disaster Recovery

Seven storage design flaws that could land you in hot water

If you're designing a new storage system, read about these seven storage gotchas that could lead to you having a lot of time on your hands.

Designing a storage solution isn't a trivial undertaking; there are many moving parts, many decisions to be made, and just as many mistakes that can be made. Here are seven mistakes that might lead to you getting in trouble.

1. Not taking RAID storage overhead into consideration.

Unfortunately, I've actually seen this happen. Any responsible storage implementation will probably use RAID to protect against the loss of one or more disks. With the exception of RAID 0, which is just a bunch of disks strung together to create a larger storage pool, all RAID implementations result in storage-related overhead that is used for mirror or parity information. The storage overhead requirements can be substantial. For example, in a RAID 1 implementation, 50% of the total disk space is used to copy the information to the mirrored set of drives. RAID 10 -- an extension of RAID 1 that stripes data across multiple RAID 1 sets to improve performance -- exacts a 50% space toll but is frequently used due to its significant performance benefits. Don't forget to take into consideration RAID overhead when deciding how much storage you need to buy.

RAID storage penalty for common RAID levels:

  • RAID 0: No storage penalty, but no protection either.
  • RAID 1: 50% storage penalty (mirrored disks).
  • RAID 5: 1/n storage penalty where n is the number of disks that make up the array.
  • RAID 6: 2/n storage penalty where n is the number of disks that make up the array.

More information about RAID levels:

2. Not taking RAID performance overhead into consideration.

RAID exacts more than just a storage penalty; in addition to reducing the amount of usable disk space, different RAID levels also impact the overall performance of the storage system. Different applications require different storage performance characteristics. Different RAID levels are best suited to different kinds of applications. For example, because of the need to calculate parity for RAID 5 and RAID 6, those RAID levels are not always suitable for write-intensive tasks such as, for example, SQL Server log files.  Choosing a RAID level that is not best suited for your application will not yield the best possible results.

In general, here are some pointers:

  • RAID 1: Read: Good, Write: Good
  • RAID 5: Read: Good, Write: Mediocre
  • RAID 6: Read: Good, Write: Poor (double parity calculation and storage)
  • RAID 10: Read: Very Good, Write: Very Good

Don't take this list to the bank, though; performance needs and characteristics vary wildly between applications, so do your homework!

More information:

3. Not implementing a solution with enough spindles.

IOPS (Input/Output Operations Per Second) is a standard method by which storage performance is measured. While a lot of elements go into figuring out the total input/output capacity of a storage infrastructure, the number of spindles (a common way to refer to the number of disks in a storage solution) is one of the most important that you can design in. The more spindles you throw at a solution, the better the overall performance will be.  Many people often assume that the transport mechanism -- iSCSI, Fibre Channel, etc. -- is the primary limiting factor from a performance standpoint, but this is often not the case. Each individual disk in your storage system is capable of a maximum number of IOPS. This maximum number is multiplied by the number of usable disks in your RAID configuration to arrive at a theoretical maximum IOPS value.

For some applications, you can figure out the number of IOPS that you need, but for other applications, you need to work with the vendor to arrive at a reasonable calculation. Without enough spindles to support your load, the rest of the storage design simply won't matter.

4. Choosing a RAID level that leaves your organization at risk.

For some, RAID had long been considered the gold standard when it comes to data protection; however, when used incorrectly, that protection might only be an illusion. Besides taking into consideration storage and performance needs, your RAID level needs to take into account the level of protection you want to maintain in the environment. RAID 5 is, by far, the most common level of RAID out there and, when used correctly, will provide organizations with a degree of protection.  However, as drive sizes get larger, the risk of data loss increases pretty quickly. Since RAID 5 can tolerate the loss of only a single disk, losing two disks is a recipe for disaster.

For more information:

5. Using the wrong kind of disk.

I already indicated that you need to make sure you have enough spindles to support the needs of your application environment. Along with that spindle count, make sure you get the right kind of disks. From an IOPS perspective, not all disks are created equal. Further, from a reliability perspective, not all disks are created equal. SATA disks, for example, can be one or two orders of magnitude less reliable than SAS disks and create a much higher risk for data loss (read my URE article). Second, most SATA disks spin at slower rates than their SAS counterparts. Although there are enterprise-grade SATA disks that spin at 10K RPM, SAS disks almost always have a 10K RPM minimum speed and can spin as fast as 15K RPM. The faster the disk spins, the more quickly it can read and write information and, hence, the higher the IOPS value.

Note that there are tricks (such as short-stroking) that you can use to force more IOPS from a disk, but I'm not going to get into those here.

More resources:

6. Not configuring a hot spare.

A hot spare is a critical part of a redundant storage system and provides the system with a way to immediately begin recovering from the loss of a disk due to hardware failure or some other catastrophe.  The quicker that an array begins to rebuild after a failure, the less likely it is that the array will suffer another disk fault that could end up resulting in the loss of data from the entire RAID volume.

Using a hot spare results in the immediate loss of that disk as usable space in the array. With many people creating multiple RAID sets on an array, you might be concerned about losing a hot spare per RAID set. Many arrays will allow you to configure a global hot spare that can automatically take the place of any drive in any RAID set across the entire array, so you can minimize your hot spare overhead while continuing to meet availability needs.

7. Not implementing enough redundancy.

Depending on the way that your storage environment will be used, you will implement different levels of redundancy. For primary, high-need storage, make sure that you implement enough redundancy in the environment to meet business needs -- that may mean dual controllers, dual UPSs, redundant data paths to the storage, redundant replicated arrays and, much more.

When designing your storage, draw every component on paper. Then, in turn, place an X over each component and determine the impact if that particular component were to fail and, for each, component, decide if you need an additional level of redundancy. For example, at Westminster, we use a dual controller EMC AX4 iSCSI. The whole storage infrastructure is redundant from the controllers to the Ethernet switches that service the storage network. For each server that connects to the storage, we use multiple NICs and provide two connections to storage; neither connection uses a common NIC in the server. For example, we use one motherboard NIC connection and an add-in Ethernet adapter connection in order to protect against the failure of a single NIC.

More resources:

Want to keep up with Scott Lowe's posts on TechRepublic?

About

Since 1994, Scott Lowe has been providing technology solutions to a variety of organizations. After spending 10 years in multiple CIO roles, Scott is now an independent consultant, blogger, author, owner of The 1610 Group, and a Senior IT Executive w...

7 comments
RicardoMenendez
RicardoMenendez

Great articles! I am learning so much about Storage just by reading your articles! I can't praise your work enough! Keep them coming please!!

shortonjr
shortonjr

This is great data here. I don't understand why it is not available in pdf format

vigremrajesh
vigremrajesh

as a system admin..the information helped me to avoid future mistakes... thanks for the information

RBatten
RBatten

Good article, Scott, very well written and laid out. For a storage professional, this may be basic information, however it is always good to review the core considerations for designing and setting up a storage node.

Michael Kassner
Michael Kassner

Storage is an area that I profess not much knowledge. The post was a great read and I learned quite a bit.

SMparky
SMparky

Unfortunately I think most decisions come down to 1 thing. Cost. I'll get a pat on the back for proposing a server with a reasonable price. If I proposed the absolute best system that I would like to buy then my purchases would be rejected. My boss would feel I had no common sense or concern for prices. It's good to know and understand the options but many of us will never be allowed to ignore costs. I know there are many costs and risks associated with implementing a less then ideal configuration, but execs won't usually care about them unless a disaster makes them think twice. That's just human nature.

MarvinH
MarvinH

As a Solutions Architect for a major storage vendor, I believe this article is great for anyone who is new to storage or needs a refresher. These are the minimum considerations when looking for a storage solution. Regarding the disk rotation speed, there are some cases where a relatively low RPM disk may be acceptable, particularly for archiving applications. The initial cost and power savings may make it worth considering, depending on your personal requirements. Check the disk's internal cache size and average seek time between models and vendors. Also check the Mean Time Between Failure (MTBF) specs for disks too. Great article - keep 'em coming!

Editor's Picks