Disaster Recovery

RAID 50 offers a balance of performance, storage capacity, and data integrity

RAID 50 is an often overlooked RAID level that can bridge the gap when it comes to choosing between RAID 5, RAID 6, and RAID 10. Scott Lowe explains why RAID 50 is his favorite RAID level.

RAID 50 is my favorite RAID level. Although RAID 50 support is not in every product (for example, my EMC AX4 at Westminster College does not support RAID 50), I find that RAID 50 provides a great balance between storage performance, storage capacity, and data integrity that's not necessarily found in other RAID levels.

If you haven't used RAID 50 before, you're in for a treat. As one of the many multilevel RAID options that are out there, RAID 50 operates by striping (RAID 0) data across multiple RAID 5 sets (Figure A). Figure A

raid50

RAID 50 diagram

As you can see in the diagram, there are three RAID 5 sets that span a total of 12 disks. Each RAID 5 set has four disks, with one disk's worth of capacity dedicated to parity information. For the example above, this means that each RAID set will lose 25% of its total capacity to parity information, as would be the case if you were to deploy a single four-disk RAID 5 set. The beauty of RAID 50 lies in the "0" part of the RAID level; this is where information is striped across each of those underlying individual RAID 5 sets.

There are a number of reasons why I like RAID 50, but there are also tradeoffs to using this RAID level. Here are some pros and cons about using RAID 50.

Disk space

RAID 5 requires 1/#disks worth of space per RAID array. In Figure A, this would mean that, if all 12 disks were in a single RAID 5 set, you'd be left with 11 disks worth of capacity. With RAID 50, you need to allocate one disk per underlying array for parity, so you're left with less usable space than you would have if you simply used RAID 5.

However, if you compare RAID 50 and RAID 10, you'll see a clear winner in RAID 50 from a capacity perspective. With RAID 10, you always lose 50% of your capacity due to mirroring. Since each underlying RAID 5 array requires a minimum of three disks (RAID 5 rules), and you lose the capacity of one disk to parity, you'll never "lose" more than 33% of your total capacity when using RAID 5. As you make each RAID 5 set larger, this loss percentage goes down. In Figure A, with four disks used in each RAID 5 set, 25% of capacity is used for parity overhead; if you make that five disks per RAID 5 set, this percentage drops to 20%. As this percentage drops, your risk increases.

RAID 50 requires an array with at least six disks -- two RAID 5 arrays of three disks each. I like to use three or four disk RAID 5 sets in RAID 50 arrays.

Risk

With RAID 5, as you increase the number of disks in the array, you increase the likelihood that you'll experience total array failure as more than one drive fails at the same time. As you move into RAID 50 territory, that additional disk space that you're giving up translates directly into lowered risk, as RAID 50 systems can suffer multiple disk faults -- as long as those disk faults happen in the right places.

With RAID 50, if you suffer multiple disk faults in any of the underlying RAID 5 arrays, the entire RAID 50 is toast; however, each individual RAID 5 array can withstand the loss of a single disk. You never want to have more than one disk go bad at a time regardless of RAID configuration, but at least with RAID 50, your chances are much better that a second disk failure will not happen in the same array as the first failure. This is one reason that keeping the individual RAID 5 arrays small (three or four disks at most) makes a lot of sense. The more disks you add to the individual RAID 5 arrays, the higher your risk for suffering a dual disk loss in one array.

Remember, the "0" part of RAID 50 offers no fault tolerance; all fault tolerance happens at the individual RAID 5 level. The RAID 0 part does help with performance.

Performance

RAID 50 does not perform as well as RAID 10 when it comes to performance in a degraded state (i.e., during a rebuild), but RAID 50, at least theoretically, performs much better than RAID 5 in overall write performance; this places RAID 50 between RAID 10 (the winner in performance) and RAID 5 (sometimes lackluster performance, depending on workload) in the performance spectrum. Actual performance usually depends on the choice of RAID controller and the kind of information being processed.

Like RAID 10 and RAID 5, RAID 50 provides excellent read performance.

Summary

When it comes to achieving a balance between storage cost, risk, and performance, few RAID levels go as far as RAID 50 for the following reasons:

  • Storage. Although RAID 50 uses more overhead space than RAID 5, it requires much less overhead than RAID 10, making it a nice in between choice.
  • Risk. With RAID 5 alone, organizations run the risk of a second disk failure that could compromise the entire array. RAID 50 mitigates this issue since multiple disks can fail, as long as the disks are the right ones.
  • Performance. Although overall read/write performance is highly dependent on a number of factors, RAID 50 should provide better write performance than RAID 5 alone.

Want to keep up with Scott Lowe's posts on TechRepublic?

About

Since 1994, Scott Lowe has been providing technology solutions to a variety of organizations. After spending 10 years in multiple CIO roles, Scott is now an independent consultant, blogger, author, owner of The 1610 Group, and a Senior IT Executive w...

31 comments
tkim
tkim

The problem with this article is that the author provided no math behind his theories. I will provide the math and show you guys that probability of failure in a RAID5 is much greater than a RAID50 -- more so than some of you may think. Using the minimum # of required drives (6), we will build a RAID 50 (2x3 drive arrays striped) and a RAID 5 (5+1 parity). Now...WITHOUT EVEN CONSIDERING the RAID level of both sets of 6 disks, we can safely conclude that the odds of losing a disk are the same in both sets. The actual odds of losing a disk is not important (I think someone mentioned 3%). What is important is that the odds of losing a disk do not change based on what RAID array it is participating in. In a 6 drive array (again, regardless of RAID level), there are 15 ways to lose two drives (assuming the order in which you lose the two drives is not important). The formula is this: n!/r!(n-r)!, where n=6 (the # of drives in the array), and r=2 (the # of drives to fail). So, regardless of RAID5 or RAID50 or RAID10, RAID6, etc., there are exactly 15 ways to have a 2 drive failure. With me so far? So here is why RAID50 is more resilient: Out of the 15 2-drive failure combinations, RAID50 can survive 9 of them! Yes, NINE! RAID5 can survive NONE! Math: 3^2, where there are 3 drives in each sub-array, and 2 drives that will fail. What does this mean? It means that if and when you encounter a 2-drive failure on your 6 drive RAID50 array, you have a 60% chance of remaining online! Of course, the more drives you have in your array, the less your chances of staying online. However, once you introduce 3 sub-arrays in your RAID50, you not only spread your odds, you introduce another factor of failure which is sustainable - the 3-drive failure. Using the example in this article, you have 12 drives in a RAID50 (striped across 3 sub-arrays). The # of ways to lose any two drives is 66 (Formula: 12!/2!(12-2)!). A RAID5 array with 12 drives cannot sustain any of these 66 combinations of two-disk failures. To calculate how many of these 66 combinations that a RAID50 can sustain, we'll do the calculation in reverse (it's easier). Let's figure out how many of these combinations will break the RAID50. This RAID50 is made up of three 4-drive RAID5 arrays. If any of the three sub-arrays breaks, the entire RAID50 breaks. So we can calculate how many ways a 4-drive RAID5 array can break, and then multiply that number by three. So the formula is 4!/2!(4-2)!=6. Then 6x3=18. There are 18 ways that the RAID50 can break. That means that if two drives were to fail on this 12-drive RAID50 array, it has a 72.7% chance of staying online. A RAID5 has a 0% chance of staying online. Add to that the ability to sustain some combinations of 3-drive failures, and the resiliency argument for RAID50 is strengthened even more.

dfrueh
dfrueh

Is it just me or is a single RAID 5 array, with a hot spare, not a good way to go. That's been our setup, and we've never even needed the RAID to rebuild. It simply copies to the hot swap, which becomes part of the array, no extra overhead required on the disks, and we can come in to a beeping drive in the morning which we can have replaced next day. Seems like a no brainer to me. I haven't much looked into RAID 6 but 5 with a hot spare is working well.

jhoward
jhoward

RAID 10 is my first choice for all of our business storage needs since our NMS and CDR data is constantly being written to and also needs to be highly available. RAID does not take the place of backups but in all seriousness the RAID levels above 0 exist to reduce the need to restore backups which is a timely exercise I have had to do way too often in the past. If you have a large space requirement and the time to restore backups or deal with a degraded array then a RAID 5/50 makes sense. If not the cost of a RAID 10 easily pays for itself the first time a disk fails in the middle of the day.

ITSPL
ITSPL

You can Say that but eventually you are using RAID 5 Array Pattern.It is Sucessful in large strorage systems but risk of multiple failure is not ruled out. Yes no doubt it is winner than other Raid levels in Performance and storage capacity.RAID-6 Dual Parity Is Not Working In this format. Gurbinder Sharma IT Professional.

tjscott007
tjscott007

Scott: Why not Raid 60 as your favorite? With your example, you could use two six disk raid 6 stripes and one stripe. This gives you the advantage that you have to have three disks in one Raid 6 stripe fail. And your percentage lost is 1/3, as opposed to your raid 60 loss of 1/4 in your example. To me, considering the well discussed idea that disks fail in bunches, Raid 60 gives much more redundancy than Raid 50. For sure the performance of Raid 60 is better than that of Raid 6, but not quite as good as Raid 50 or Raid 10.

S,David
S,David

Maybe I am missing something, but it seem that if I have a risk of two drives failing in a RAID 5, by adding additional RAID 5 under a RAID 0, I still have the same risk of the same two drives failing in a single RAID 5 and taking out everything. I don't see how RAID 50 changes the probability of two drives in a particular RAID 5 failing at the same time.

dlovep
dlovep

To be honest, I read some white papers done by Toshiba few years ago, comparing RAID-0,1,5, RAID-10, RAID-50 & RAID-60. With a very large amount of testings done for many application. RAID-5 & RAID-50 only good for Webhosting or something merely require "WRITE". So RAID-50 is not the winner, I do recall it should be RAID-60 or RAID-10

JamesKelley
JamesKelley

Not to mention an increased risk of a single controller failure knocking the entire thing out.

batman7777
batman7777

as for probability. lets say you have 10 drives (for simplicity). if any 2 of those 10 fail, your probability is 20% chanse of failure (if you assume that by average 2 disks out of every 100 will fail (probability)) Thus, if you have two sets of 5 disks in raid 5 (which then are striped (raid 50)) then your probability has dropped from 20% to 10% per array. The probability of 2 disks failing out of every 100 disks has reduced because the same chanses have now been distributed over to another array, reducing your probability of it happening with the same array, as it would when you use only 1 array. make sense? you may need a better understanding of probability if this is not clear :) I have actually thought of this idea about 10 years ago. I wanted to setup 3 raid 5 arrays with 9 disks each and then use windows to create a stripe set :)

hjmangalam
hjmangalam

I believe this is the second post the original author has made on this strange storage decision. If you're in charge of an org's storage and you're not going to go full-on Isilon or similar, why not pony up the extra disk for a great deal more robustness and equivalent speed. The XOR calculations are hardly a bottleneck for a dedicated processor (I assume you're doing this with hardware controllers and not MDADM), and odds are that when one disk fails in a R5, the rebuild process will cause maximum stress that may contribute to another disk failure in the same RAID, especially if all teh disks are from the same production run and they've all aged similarly. Putting all this onto R5 when there is well-developed R6 technology (almost certainly in the same controller) just seems ... well, the polite word is 'inexplicable'. Or use ZFS

PaulMarchant
PaulMarchant

Instead of having 1 large RAID 5, you have a number of small (3 or 4 disk) RAID 5 arrays. So it's possible for the 2nd disk failure to occur in one of the other arrays. Hence the odds of a catastrophic 2 disk failure have been reduced (but not eliminated).

hjmangalam
hjmangalam

As you say, if the world of probability distributes failures the way you prefer, then you could lose 3 disks spread equitably across your R5 arrays without losing data. However if failure does not occur the way you prefer (overwhelmingly the case), then if you lose 2 disks in 1 array, your data (mod backups) is gone. Whereas with R6, you can lose 2 disks in ANY array and still maintain data. As well, to accept other parts of your argument, you are using a gaussian distribution of failure, which is often not the case (as the Google and UMD studies of large populations of disk failures show, this is usually not the case). As someone else mentioned, disk failures, due to parallel aging, bad batch runs, overheating, etc tends to occur in repeats in a failing RAID. Even if you are correct on the stats, the real-life expectation of failure profiles should argue for R6, which given an optimum number of spindles, cache, and hardware controllers, should be competitive with R5. What am I missing here? Happy to be taught.

S,David
S,David

I don't agree. For a given quantity of drives there will be a probability that two of them will fail at the same time, where "the same time" is defined as the period of time it takes to rebuild a single drive in a RAID 5. Simply dividing the drives up into subsets does not change that probability. And, without prior knowledge, there is no way to know which of the drives will fail, so there is no way to divide the failures among the subsets. A double drive failure in a RAID 5 will kill the array, and if that RAID 5 is in a RAID 50 it will kill the RAID 50. There is no change in the level of risk.

MikeGall
MikeGall

Over a two year period dealing with 6 units From a vendor who shall remain nameless with a total storage of 120TB useable in a RAID 6. I had about 5 disk failures. 3 of which resulted in a second failure during the restore. Sometimes I had to restart the disk array to get the SAN fabric to detect the device again and when it came back up from the power cycle ... viola another bad disk. RAID 6 saved the day.

batman7777
batman7777

Raid 5 can survive a single disk failure in each array. therefore if the data is striped, across underlying raid 5 arrays, each raid 5 array would manage its own data redundancy. thus, you could potentially lose up to 3 disks (one in each array).

S,David
S,David

If I have a single RAID 5 array, there is some probability that it will suffer a double drive failure. If I have two identical RAID 5 arrays, the probability of the second having a double drive failure is the same as it is for the first array. It doesn't change if they are in a RAID 50 or as separate arrays in separate machines.

brianalls
brianalls

I like the concept of RAID50 for reads, but I'm wondering if you're actually doubling the RAID 5 write penalty. So where you are looking a 5 or 6 physical IOs per write with a RAID5, that would become 10 to 12 IOs with RAID 50. I guess it partially depends on the array's firmware because there would are a lot of ways you might be able to optimize RAID 50. With RAID 6, write performance should be pretty similar to RAID 5, with less physical IOs and less wasted space. And you still have protection from a double drive failure.

tkim
tkim

"But you can also argue that there's an equal chance that the same name could be drawn again, LOWERING your chances. This is also correct. ...the odds against the same name being drawn again are much higher." That is incorrect, and is the typical weekend gambler's mentality. I've seen many foolish people sit idle by a roulette table and watch for long streaks of black or red, then pounce on the opposite color when a long streak has been reached -- incorrectly thinking that the odds of it landing on the other color is much higher -- since it rolled black (or red) 10 times in a row. This is WRONG! Any time you allow for repetition, the odds remain the same on EVERY spin of the wheel (or "flip of the coin", or "name draw out of a hat", etc.)

batman7777
batman7777

HI David, I am truely sorry about my rsponse, I got ticked off the same way you did about my message. I have ADHD which I dont want to use as an excuse, and my medication wears off arround that time of the evening, so I have an uncontrolable temper which has brought me into trouble before. regardless, I am truely sorry about that response. you did not deserve it. I hope you will accept my apologies. regards Bronwen

batman7777
batman7777

Hi, I want to apologize, as I have responded with impulse and said things which were in apropriate.. I have removed some remarks and I hope that anyone who read that, will accept my apology. Regards Bronwen

S,David
S,David

Wow. Do you always insult people that don't accept your word as gospel from on high? Really, insulting someone, then saying "no offense" does not cut it. If you want to impress me, explaining where I went wrong in a calm manner with examples will go a lot farther. Calling me a five year old with a desire for self-abuse will not inspire me to accept even a small fraction of your explanation. Your explanation of the drive issue might even be right, but I'll never know, because I quit reading after "5 year old."

info
info

Bronwen was a bit on the 'overenthusiastic' side during his rant. (Okay, he came across as a bit of a nut, but he's a hardware designer, so that explains things. ;) ) But he's technically right. It's all in how you play statistics and probabilities, and you can make those numbers jump to fit any theory you want. For instance, a local hospital states their yearly lottery gives you a '1 in 20' chance of winning, because they put each name back in after every draw. This is correct. But you can also argue that there's an equal chance that the same name could be drawn again, LOWERING your chances. This is also correct. However, when you're looking at a near-random situation, the odds against the same name being drawn again are much higher. This is similar to the RAID argument. Now, a single RAID5 array is usually built from similar drives, usually from the same production run if it's fully built from the start, so this will tweak the odds of a same-array failure up a lot. But with multiple arrays the chance of that type of failure decreases. Enough so that I would wager money against a RAID50 going down on the arguments presented, but much LESS money on a RAID5. I've had good luck with RAID5, but I work with smaller arrays on non-performance critical servers. As for a same-array failure taking down the RAID50. In Life, stuff happens (and I'm thinking about a word other than 'stuff'). But isn't this why we make backups? So for a limited budget, I'd say RAID50 looks pretty good. I'll be considering this over RAID10 for my next build.

santeewelding
santeewelding

As an exercise in technical intellect. At first (above), I sympathized with you. Now, I don't, as a matter of pure intellect. You need to be kept indoors for your own good.

batman7777
batman7777

** removed some of my venting which was inapropriate and I`m sorry about that. ** No where did I say 2 disk failures in the same RAID 5 configuration will be OK. So there is no need to empthasize on it. I`m with you on that. Keep in mind that this is only an example. take a look: FACT: 3% of disks WILL fail. - look it up online... Mean time to failure (MTTF) will vary by manufacturer, temperatures and other conditions, however 3% is the average proven by conducted tests (some over a period of 5 years). Now, lets prove that there is a difference in probability between raid 5 and raid 50. say you have 100 000 of those disks in 1 single raid 5 array. (hypothetically) 100 000 disks in 1 single raid 5 array and 3% of those disks are 3000, which WILL fail at ANY time over the next 5 years.(5 year expected lifetime of a hard disk - where probability of failure is calculated on). now, I have another 10000 other raid 5 arrays of just 10 disks in each array. All striped. (raid 50) (again, hypothetically) 10000 x (10 disk raid 5 arrays). each of those 10 disk arrays has a probability of 0.3 disks to fail. (3% of 10 disks) over a 5 year period, thats only 0.3 disks in each array that will fail. The total disks of 3000 over 10 000 raid 5 configurations will still be 3% of the disks, and they will still fail. the probability of 2 disks failing in a 10 disk raid 5 array is only much less, because there is 7000 arrays which will not even have a single failure. however in the other single array of 100 000 disks, if just one disk fails, which has no probability as 3000 will fail, the entire Raid 5 array IS DOWN!. thus 1 array is NOT = to 10000 smaller arrays because 7000 of them are still standing. I have repeated it enough ( i hope).. good luck.

hjmangalam
hjmangalam

I present this short play as a demonstration of why some of us have trouble accepting your view of probability: Probability: Excuse me, terribly sorry to interrupt your day, but it's me, Probability. I'm afraid I have to take 3 disks from your storage array. I see you have 3 R5s spinning quite happily there in an R50. Where would you like your failures to occur? You: Oh, thanks for the heads-up - it would be really convenient if you could take 1 disk from each of the R5s. Wouldn't want to lose any data - just processed about a billion $ worth of drug trial data and it would be really inconvenient to lose it. Probability: You sure about that? You wouldn't like me to take 2 disks from 1 R5 and 1 from another? You: Oh god NO! that would certainly ruin my day and a lot of other peoples'. I'm absolutely sure - 1 from each. Probability: You know, I normally wouldn't do this for just anyone, but since you make such a good argument for it - wouldn't want to wreck anyone's day - I'll fail them just as you wish - one from each R5. You: Oh, jolly good show! Thanks a bunch, Prob! What a nice chap you are! Have a dead parrot? ------- curtain closes ------- In my universe, probability doesn't play quite so nice. I think that bayesian stats also plays significant role in how you should consider your risk and therefore the technology to address it. ie if demonstrable real-world events or data show that failure does not occur with a gaussian distribution, you would be remiss in not addressing the skewed risk.

Clayton L.
Clayton L.

You don't increase the risk of failure, because the rate of failure is based on the individual RAID5 set. A set of 3 RAID5 has less probability of failure that 1 RAID5 of 9 disks. You have to consider what you're comparing your failure rate against. Therefore you can have 1 disk in each of the 3 RAID5's fail, but you don't have the data loss if you have 3 disks fail in a 9 disk RAID5 array. Keep it in perspective.

CS10
CS10

The 3 Raid5 are NOT mirrored, they are stripped. If one fails the whole stripe set fails. For a raid5 to fail, you need two disk failures in the raid5 set. A set of 3 raid5 has more probability to fail than just one. So you increase the risk of failure, as well as the performance. To summarize, I am not convinced at all...

rmaul
rmaul

Yes, it does. Remember these are "mirrored" arrays. There has to be a four drive failure before the rebuild can occur before you have a problem in a two array set. In the example, you must have a six drive failure to bring the entire array down. As long as one array is intact, the system is up - although degraded. Just make sure your drives are not all of the same run from the manufacturer. Have seen that before. Most two drive failures in a pre configured system come from the fact that the manufacturer had a bad run. Been through that multiple times in servers from a well known manufacturer. Warranty always replaced the drives, but that isn't the issue. The example does pose some overhead on write operations, but is about as safe as it gets - in my opinion. RAID 60 would be better, but the additional overhead might take it out of consideration in a heavy write environment. Have multiple RAID 6 arrays, in an environment where there is a substantial "write" out of normal business hours, then the "read" during normal hours. You just have to work through what works best in the situation you have.

batman7777
batman7777

you obviously missed the fact that I said, raid 5 has a peak at arround 9 disks before you start losing value of adding more disks. thus, assuming that the max amount of disks in a raid 5 array is 9. you will be benefitting from using 3 x 4 disk arrays, (totalling 12 disks) which you could not use before. however, assuming you could use 12 disks, I would agree with what ur saying. but realistically, (and I did say, IF this is still true), you would not be able to compare apples with apples. :)

radio1
radio1

In you math, you mention a 9 disk raid 5, a 12 disk raid 50 and a 10 disk raid 6. Using your math a 12 disk raid 6 would give you the performance of (12-2)*10=100. That is higher performance than the raid 50 and also has more space. Also the raid 5 at 12 disk would be (12-1)*10=110. Again higher than raid 50. The benefit of Raid 50 is not the performance increase over raid 5, but the increased fault tolerance. However the chance of a failed disk are higher during a rebuild when all of the sectors are scanned. This is true for raid 5, 50 and 6. The difference is that a Raid 6 can tolerate 2 disk failures anywhere.

batman7777
batman7777

If you have raid 5 with 12 disks you would probably be on the downward curve in performance as I recall 9 disks to be optimal for Raid 5 (as the max disks per array) so assuming thats still true, 9 disks @ 10MB per second (just a simple example) 9* 10 = 90 -10 = 80MB per second of usable data (writen) (lets ignore the .01 seconds of calculations etc) now, 3 x 4 disk raid 5 arrays (striped) 3 * 30 = 90 (already reduced the 3 x 10MB parity disks) so this striped, exceeds the performance of Raid 5 on a single array, plus reduced the probability of multiple disk failures in a single array, ability to withstand conditional multiple disk failures. I'd say all in all, it would kick raid 5 to the curb. :) now, Raid 6.. assuming 10 disks 2 for parity 8 for data writes would be reduced by 2/n's instead of 1/n so may be worth calculating to more specifics, but, 8* 10MB ps = still less than raid 50 :) Raid 5 and 6 I think would be limited to the performance peak. where as Raid 50 may have other hardware bottlenecks at some point.