Data Centers

Understand RAID from the ground up

Get the scoop on RAID with James McPherson's Daily Drill Down. He covers why, when, and how you should use it for optimum performance upgrades. He also includes a breakdown of each configuration.

Ah, the joys of RAID. No, I do not mean the bug spray. I’m referring to something far more useful: redundant array of independent disks (RAID). RAID is a way to combine multiple hard drives into a single large volume with increased performance, redundancy, and less downtime. However, it comes at a cost of space and power, as you run a number of inexpensive drives in a redundant array.

History and technology
RAID came about many years ago when someone came to the realization that it would be cheaper per megabyte to buy two or more slightly older low-density drives than a single cutting edge high-density drive. They then realized it shouldn’t be that hard to configure multiple drives to store data across multiple hard drive platters, the way a single drive does. Add a little hardware wizardry to hide all the work from the motherboard, and you have a RAID controller.

RAID works by splitting up the data between disks in an interleaved fashion. One example of a RAID configuration is in Table A below.

Table A
You can see that by splitting the data this way, the controller can be fairly confident that the work is divided among all the drives, maximizing performance.

The redundant part of RAID comes from having the data copied to multiple drives. You can get different levels of redundancy, performance, and useable space, depending on which of the available levels of redundancy are used.

Why would I want it, and where would I put it?
Let’s pretend that you have a lot of data that needs to be stored on a single volume. Say it’s something important, like your company’s billing database or the digital video intended for your next commercial, something that can easily grow to a hundred gigabytes of data or more, that you just can’t live without. Now, let’s pretend that you’ve gone drive shopping and have realized that cutting edge hardware can be costly. Since your data absolutely has to be on a single volume, you can either shell out a bundle for a 100-GB drive or pick up a handful of 20-GB drives.

RAID is excellent for file servers, databases, video editing stations, or proxy servers that need high-speed data transfers. When it comes to moving data, RAID is faster than any single drive available. For example, a single IDE drive will typically operate at about 30 MB/s when doing large sustained transfers with a maximum burst transfer rate of 100 MB/s. On a gigabit Ethernet connection, you have 125 MB/s of theoretical bandwidth available. That would require four IDE drives to max out once their buffers were empty, which would only take a few seconds. Since an IDE controller can only run one of the two drives on its chain at a time, you would need four controllers, which is more than most motherboards have.

SCSI drives are a little faster, pulling in an extra 5 MB/s peak transfer on the more expensive drives and burst transfers up to 160 MB/s. SCSI controllers are a bit more effective than IDE controllers, since they can operate more than a half-dozen devices with all the drives talking to each other simultaneously while not touching the motherboard. However, you still require three SCSI drives to fill that gigabit connection.

Levels of redundancy
Table B below, shows some examples of different RAID configurations. For each configuration, I have included corresponding examples of typical values you would likely see. (The legend for the various configurations can be found in Table B.)

Table B

RAID 0—disk striping
The heart and soul of RAID is its ability to combine multiple disks into a single volume. RAID 0 is just that and nothing else. It offers no redundancy and, believe it or not, a reduced lifespan. How can that be? Without any form of integral backup, if any drive suffers a critical failure, the entire volume collapses. The more drives you have, the greater the odds of any one of them going bad.

On the plus side, you lose absolutely no space from your volume and gain maximum performance. Two 40-GB drives combine to a single 80-GB volume with double the transfer rate. Five drives turn into a 200-GB volume with quintuple the transfers. The math is really simple, as Table C and Table D illustrate.

Table C
Disk striping is really useful when the data isn’t important or size is a factor. Proxy servers are an example of the former, and video-editing stations would fit the latter.

Table D
RAID 0 configuration

RAID 1—disk mirroring
Here the data is, well, mirrored, onto disk pairs. The result is maximum redundancy, with minimal efficiency of space. Multiple pairs can be used to increase total volume sizes. Writes aren’t any faster than a single drive, since both drives are involved, but data reads are twice as fast as a single drive, which doesn’t hurt performance. (See Table E and Table F below.)

Table E
Disk mirroring is popular for use on the operating system, because a disk failure requires nothing more than a reboot and a change in a setting on the controller to bring your machine back online.

Table F
RAID 1 configuration

RAID 2—error correction
Once upon a time, error correction (ECC) wasn’t a standard feature on all drives. RAID 2 provided a way to verify that data was written safely to disk. RAID 2 was obsolete several years ago when hard drives incorporated ECC as a standard feature. Raid 2 could only read and write as fast as a single drive, as described in Table G and Table H.

Table G
Drive space was also compromised since it took a significant amount of space for ECC.

Table H
RAID 2 configuration

RAID 3—parity
The first parity system, RAID 3, uses a kind of data compression to store backup data onto a specific drive, called the parity drive. When one of the data drives fails, the parity drive is used to rebuild the missing data. Raid 3 uses small spanning sectors, causing reads and writes to hit all active drives keeping the data transfers similar to a single drive. (See Table I and Table J.)

Table I
RAID 3 has been outdated due to higher redundancy levels with equivalent performance.

Table J
RAID 3 configuration

RAID 4—improved parity
RAID 4 tries to get the parity ability of RAID 3 and the improved read performance of RAID 1. It uses a parity drive for recovery but uses much larger spanning sectors that allow most files to fit on a single drive rather than being split up. The large sectors enable different files to be read simultaneously and increasing performance almost to the level of RAID 0. Write performance is still limited to that of a single drive because of the shared parity drive. (See Table K and Table L.)

Table K
Like RAID 3, RAID 5 has outdated RAID 4.

Table L
RAID 4 configuration

RAID 5—best of all worlds
RAID 5 is the greatest common incarnation of RAID. Essentially, the parity data is spread out evenly across all the other drives rather than be isolated to a single drive. This enables all drives to read simultaneously, maximizing read performance. Writing is still somewhat reduced because of the parity writes, but with the parity data usually landing on a separate drive, it is still noticeably improved. (See Table M and Table N.)

Table M
RAID 5 uses multiple drives that do double duty as both data storage and parity.

Table N
RAID 5 configuration

RAID 6—dual parity
Take RAID 5, add an extra level of parity that targets another drive, and you have RAID 6. (See Table O and Table P.)

Table O
Like RAID 5, RAID 6 enables all drives to read simultaneously with the reduced write speeds, but recovery and fault tolerance are improved.

Table P
RAID 6 configuration

RAID 7—the “not RAID” RAID
RAID 7 is not an officially recognized RAID category but is a full proprietary system from Storage Computer Corporation. The price of the management system is significant compared to the cost of the drives, thus reducing your savings. Further, the management system must be on a UPS to enable all the cached parity functions to be written to the drive. On the plus side, the management system caches a lot of data enabling read performance of heavily used files far in excess of the drive speeds (see Table Q and Table R).

Table Q
RAID 7 is a RAID 4-esque system that incorporates a more complicated parity management system using a real-time operating system.

Table R
RAID 7 configuration

RAID 10—striped disk mirroring
This is an array that stripes mirrored volumes. It does require some additional complexity at the controller level, but it is usually offset by the lack of parity calculations. (See Table S and Table T.)

Table S
RAID 10 provides the instant restore features of RAID 1 but boosts performance.

Table T
RAID 10 configuration

RAID 53—striped parity
Why RAID 53 is not RAID 30, I’m not sure. It is best used on volumes of sufficient size in which multiple-parity drives would be needed. It is simpler to use a dedicated parity drive rather than use the distributed parity system of RAID 5 and may be cost effective on some very large arrays. (See Table U and Table V.)

Table U
RAID 53 is a striped array of parity-drive equipped arrays.

Table V
RAID 53 configuration

RAID 0+1—mirrored disk striping
This is an array that mirrors striped volumes. RAID 0+1 is a general-purpose solution with decent performance and a reasonable level of tolerance. (See Table W and Table X.)

Table W
RAID 0 + 1 offers more reliability than just a striped system with quicker response than a mirrored volume.

Table X
RAID 0 + 1 configuration

Spin down
RAID isn’t useful for all situations; most desktop machines can’t take advantage of the increased performance, and workstations can generally get by with frequent backups. Servers are prime candidates for RAID but only if there is enough capacity on the network or internal demand to justify it. However, until the hard drive manufacturers can build us 500-GB drives capable of more than150 MB/s sustained transfer, RAID is here to stay.

Editor's Picks