Disaster Recovery

RAID 5 or RAID 6: Which should you select?

Many RAID controllers now support both RAID 5 and RAID 6. IT pro Rick Vanover explains what RAID 6 is and when to select it over RAID 5.

When it comes to architecting a storage solution, planning is key. In addition to selecting the storage protocol (iSCSI, fibre channel, NAS, etc.) and disk type to use (SAS, SATA, SSD, etc.), you should also give a lot of thought about what RAID algorithm to use.

I used to try to keep it simple and stick to either RAID 1 or RAID 5 when I wasn't working with larger amounts of storage. However, when it comes to provisioning larger storage in any SAN or NAS environment, it may be worth selecting RAID 6 over RAID 5 for larger arrays. To help you understand how RAID 5 and RAID 6 differ, we'll explain each RAID type.

RAID 5 is an array that has a distributed parity bit across the array. Figure A is a representation of RAID 5. Figure A

RAID 5

The grey blocks are the parity bit through the array. RAID 5 takes a minimum of three drives to implement; this example uses four drives to give an easier visual comparison to RAID 6.

RAID 6 uses two independent parity schemes that maintain array integrity. Figure B represents RAID 6. Figure B

RAID 6

The grey blocks are the parity bits, but the two parity algorithms exist separately on their blocks. RAID 6 has more overhead in terms of usable storage compared to the raw amount, as well as a more complex RAID controller algothrithm. According to the AC&NC RAID information sheet, RAID 5 has better write performance.

I prefer building for RAID 6 in spite of the RAID 5 write performance advantage. This is primarily due to the rebuild times being less impactful on disk with RAID 6 even though they take longer, which is an important factor considering we now have 2 TB or larger disks in use in many NAS and SAN storage systems. RAID 6 has additional protection against block failures and controller errors by the extra parity. If zero loss of data is a priority, this is something to consider. I've had controllers have logic errors and data loss even when the actual disks are healthy.

What is your take on RAID 5 vs. RAID 6? Share your comments below.

TechRepublic's Servers and Storage newsletter, delivered on Monday and Wednesday, offers tips that will help you manage and optimize your data center. Automatically sign up today!

About

Rick Vanover is a software strategy specialist for Veeam Software, based in Columbus, Ohio. Rick has years of IT experience and focuses on virtualization, Windows-based server administration, and system hardware.

69 comments
jmjosh
jmjosh

For all those who claim it's better to have RAID10 (or 1+0, or 0+1) than RAID6:

If you have the basic config with 4 drives - RAID10 wins with higer speeds but RAID6 wins with higher security (ANY 2 disks can fail, and in RAID10 2 failed disks can kill your data).

Once we start increasing the number of drives - RAID10 still wins with write/read speed, but still can fail if 2 disks from one mirror fail. This becomes significant, since the more drives we have - the probability of a disk failure increases. And since it's rather hard to get - let's say - 40 drives, each from different production series, the probability of 2 disks failing at the some time is quite high.

At the same time, space efficiency falls dramatically - with 12 4TB disks we have 40TB space for RAID6 and 24TB for RAID10.

So, I'd rather go for RAID6 with some hotspare in order to have reasonable speed and data security.

Actually, I wish there were some RAID-6 like plans for bigger arrays, that would protect from 5-6 disks failing at the same time.

If I am about to create array from 100 4TB drives, I'd rather split it to 10 RAID-6 arrays giving 320TB to reduce risk of losing all data when 2% of drives somehow fail, at cost of 20% overall drive capacity. If there was a RAID plan that would consume 10% of capacity, that would give untouched data even when 10 disks fail, what would be great - without need to split data etc.

ganesh.borhade
ganesh.borhade

Hello Rick, What is impact on file server (read) when we use RAID6 (in place of RAID5) ? How to compensate write performance issue of RAID6? Can we add new disks? Regards Ganesh

vivekgreets
vivekgreets

When you say large storage for RAID6 how much storage do you refer to? I would like to have an idea of the large storage range please.

drsw
drsw

The company for which I work has thousands of backup appliances in the field from which we get a tremendous amount of data concerning this subject. After years using RAID-5, we now ship only RAID-1 in our low-end appliances, RAID-10 in our medium-end appliances, and RAID-6 (actually, RAID-1/RAID-6) in our higher-end appliances. In the real world, with larger disks, the major reason that you end up going to RAID-6 over RAID-5 is the concern with respect to drive failure during in-operation rebuild. __________________ Mark Campbell http://www.unitrends.com/

nick.ferrar
nick.ferrar

I don't understand the reasoning in the original article, why sacrifice performance to reduce disk rebuild times (a rare occurrence)? If you need more redundancy and quicker rebuilds than RAID 5 you should be using RAID 10, most storage vendors I've discussed it with hate the idea of RAID 6 - they just had to add support for it because their competitors supported it and poorly informed end users bought into the smoke and mirrors.

jeslurkin
jeslurkin

Glad to learn of newer RAID configs.

Rick_hayward99037
Rick_hayward99037

Many times, the choice of RAID comes down to software and hardware support. When RAID-6 isn't supported, it becomes a choice of "use what you have" VS "buy a new controller card." On our last non-recoveable RAID-5 experiance, we opted to do RAID-10. Our controller card could do it, but not RAID-6, and we would have enough storage in that configuration, with the hard drives we had. RAID-10 is very redundant, so very hard to kill. It also could be considered wasteful, but drives are cheap too. The actual read/write speeds will depend on the controller card's processor, and hard drives used.

jjbueyes
jjbueyes

RAID 5 is better when used on Database systems, and may be less expensive

ScarF
ScarF

Drawing the correct image for RAID-5 with 4 disks, eh? The distributed parity requires 1/N space on each disk, where N is the number of disks in array. Same remark for RAID-6. Since you have 4 disks, each disk is divided in 4 areas, only. Otherwise, the advantage of using RAID-6 for large ammounts of data, is correct.

Leo
Leo

Considering that I'm not going to be rebuilding disk more often that I will need to write to them, i will still have to choose RAID5. Performance is more important to me than rebuild times. Plus, the added cost just isn't justified. -Leo

dr.funkenstein01
dr.funkenstein01

Hi, Has anyone used any of the DROBO products. What you're views on DROBO

antermyhome
antermyhome

I think, RAID 5 is better for smaller Companies where data priority is less then data availabilty. Other side RAID 6 is better for large data storage, where every data is critical.

rajesh.pillai
rajesh.pillai

With disks having TB of capacity, the rebuild time also increases, RAID 6 can reduce this risk

klapper
klapper

Agreed, we've got a Promise vTrack with 16 1TB drives in a RAID6 configuration. We do lose some disk space and there is probably a slight performance hit; but the rebuild and fault tolerance benefits make it worth it. These 16 drives were manufactured very close to each other. If one drive dies, we think the probably of a second drive dying increases. RAID6 helps with that issue.

jschlaf1
jschlaf1

Recently had a 5+ TB array in raid 5 for archiving data. The write performance was terrible and rebuild took several days. In addition GUID Partition table not recognized by older clients. I replaced it with multiple arrays of RAID1. still have the rdundancey and better all around performance.

GDF
GDF

The way I remember this (and it's been a while), with RAID 5 you need one extra drive, and with RAID 6 you need two. So for large arrays the difference in overhead (i.e., drives purchased vs. storage available) is not significant. Then in the case of a drive failure - which is still the most common failure mode for RAIDs - with RAID 5 you're running unprotected and need to get a replacement drive in "right now"; with RAID 6 you still have RAID-5-equivalent protection and can replace the failing drive on a more practical schedule.

renodogs
renodogs

You lightly touched on one major aspect of today's drives and inherent problems within: error correction on perfectly good drives. Simply put, there's a massive amount of overhead on today's drives given the manufacturer's acceptable hardware tolerance for error correction. Read your drive specs, I think you'll be surprised. The rule of thumb is the larger drive space you have, the larger allotted error correction time; which = cpu peformance overhead. Life's great when we can store the equivalent of the Library of Congress on a few arrays. Nonetheless, one cannot simply ignore the data errors that exist on perfectly functioning drives. IT folks that don't factor in the leakage of errors are doomed to long hours of chasing the blame game from the upper echelon screaming "I thought this system was going to be reliable!!!" Well Sir, it is, sort of. Factor in aging drive spindle noise, worn internal mechanical components of the drive- and you have all the makings of a king size headache. The bottom line is this: in any disk array, there are trade-offs in performance, integrity, and finally, cost. Therefore, the reality of which array and which architecture to select must come down to two things: first, data integrity. Just how good is it? Secondly, cost. You can't ignore the CFO and their insistence to remain within a sane budget. Anything less than that and you will fall into that age old trap of 'head in sand' viewpoint, and it WILL come back to bite you.

phatty
phatty

RAID 6 rebuild times are slower than RAID 5. This article is incorrect.

byoung
byoung

If zero data loss is the only goal, then RAID 6 is more efficient than RAID 1, which has 50% overhead. But where 100% uptime is required, you still can't beat RAID 1. Perhaps a RAID 1+6 solution, while expensive, is the best for large storage systems.

feenberg
feenberg

I would be interested in learning why rebuild times should be faster with Raid 6. It doesn't seem obvious, as the rebuild process still has to read or write the same number of drives with raid 5 or raid 6 - doesn't it?

juanfermin
juanfermin

Assuming your System Bus can handle the extra bandwidth, the more drives you have the better the Read performance is. Let's say you have SATAII 300Mbps disks, with RAID 5, you're reading data off 2 disks simultaneously, so your actual throughput is closer to 600Mbps and with RAID6 it's closer to 900Mbps. Unless a drive has failed, Read requests don't touch the Parity bits, which in the case of RAID5 it's 1 stripe of Parity and with RAID6, it's two.

david
david

With RAID 5, you lose the space of 1 drive (for parity). In RAID 6, you lose the space of two drives. For example, using RAID 5 in an array with (16) 1TB drives, your total available RAW space would be 15TB. Using RAID 6, your total available RAW space would be 14TB.

b4real
b4real

Too many organizations can't justify the overhead, or more importantly the funds, to get that extra level -> But you are correct, RAID 1+0 is the next best way to go.

juanfermin
juanfermin

Once where 2 disks failed at the same time and another time, a second disk failed during Rebuild, so sometimes RAID6 DOES make sense.

michaellashinsky
michaellashinsky

I have to admit, I did not understand your point. Are you saying that error correction is needed, or not needed? Please bring it down a notch for me, since I am trying to understand, but I do not yet.

radio1
radio1

I agree that raid 5 is dangerous. often times when the controller is hammering at the drives during a rebuild, it notices that another drive has failed. Raid 10 can tolerate up to 2 drive failure. that means that not any 2 drive can fail, only 2 drives as long as certain conditions are meant. mainly as long as the drives are both in the same mirror, data will not be lost. Raid 6 can handle ANY 2 drive failures. it doesn't matter which 2. Add in to this background consistency checks and raid 6 offers a very high reliability. Raid 6 also offers a higher performance by using all drives for reading and writing, not just half. as far as the performance drops, this is usually a constraint on the processing in the controller while raid 10 is a constraint of the hard drive transfer speeds. A good controller in a raid 6 outperforms a raid 10 if all others factors remain the same. I don't pay much attention to online benchmarks. I do my own testing.

maj37
maj37

Interesting and informative, but his treatise ignores the CRC built into the drives that are designed to prevent the returning of garbage by the drive. This CRC is a drive level function not a RAID controller function so it is done independent of any array hardware. maj

LouCed
LouCed

Thank you for the link. I'm a 1+0 guy from now on!

bboyd
bboyd

Thanks for pointing the way. Raid Five & Six are compromises that are not needed. All that money spent on hardware to achieve reliability that induces failure points on its own.

jjcanaday
jjcanaday

No redundancy? I don't see them advocating for any type of data safeguarding. Do they think nightly backups is enough? Maybe I didn't take enough time (and I didn't download any of the pdf docs) but, all I can see is them advocating AGAINST RAID -3 -4 -5. Is RAID -0 or -1 OK?

rhys
rhys

If I have it right then you suggest a bunch of disks each in a RAID 1 mirror, then a RAID 6 spanning the mirrored disks. I can see there is greater protection than RAID 10, or 15, but what is the actual difference in risk between RAID 15 and RAID 16? For total failure: RAID 15 requires a minimum of four failures in less than the time required for two consecutive mirror rebuilds. RAID 16 requires a minimum of six failures in less than the time required for three consecutive mirror rebuilds. I haven't seen a RAID 1 rebuild happen in production, I don't have stats on the failure rate fo mirror rebuilds. I have never heard of a RAID 1 rebuild failing, but I am junior. RAID 16 requires a minimum of 8 disks. Probability of losing both disks from one of the mirrors from the overlying RAID 6 are 1/7th of the probability of a mirror rebuild failure, or lower. Now you have same resilience as RAID 15 (or near enough). This is the advantage of RAID 16 over RAID 15. Given the conditional probability for total failure of RAID 15 what risk are we looking at here? Is the extra layer fo RAID 16 really adding protection that you need? High Availability is HA, but what are you aiming for that you need RAID 16 rather than RAID 15? Can anyone give an expected failure rate for RAID 15 or RAID 16?

Kerah
Kerah

and you dont know ,......oh some of blindness will be useful in that situation as it like the big and horrible start

jc2it
jc2it

Try to avoid RAID5 and RAID6 like the plague. Use RAID10 (or RAID 1+0) instead. S.A.M.E. Stripe and Mirror Everything. It will provide the best performance and the best reliability. The extra cost of RAID10 is not enough to warrant the reduced reliability and increased recovery time of RAID5 or RAID6.

kristain
kristain

Hard disk capacities are rapidly increasing, and so are RAID rebuild times. The problem is particularly apparent with SATA drive technology, but as 300 GB Fibre Channel drives become more prevalent, even Fibre Channel-based arrays are suffering from long array rebuild times. Many storage administrators are using fewer and fewer drives in an array group. Although using larger capacity drives may make economic sense, fewer drives equate to longer rebuild times. RAID 5 array with five 500 GB SATA drives took approximately 24 hours to rebuild. With nine 500 GB drives and almost the exact same data set, it took fewer than eight hours. Below are some tips to help you control RAID rebuild times.

bloubert
bloubert

"the rebuild times being less impactful on disk with RAID 6 even though they take longer"

robertog169
robertog169

I would think its faster because you have 2 times as many parity blocks so reads on the parity arent isolated to single blocks/disks. If one block is busy the same parity is available elsewhere and the reader doesnt have to wait for 1 disk. Anyone agree?? =)

drn
drn

With the price of disks being so, so cheap, there is no longer makes business sense to take the hit of slower writes and long recoveries of RAID 5 or 6. Nowadays, I never see a business case for other than RAID 1+0 (sometimes incorrectly called RAID 10).

juanfermin
juanfermin

For the most part you lose nearly 1/3rd of total space for RAID5 and due to higher levels of compression with RAID 6, it's slightly more due to the fact that the Parity structure is doubled. Raid 5 requires a minimum of 3 drives and Raid 6 requires 4, the compression algorithm's get better with the more drives you add, especially with RAID 6, but RAID 6 will always use more space for Parity than RAID 5, but less than RAID 10, so it's a good balance for those who want to have 4 drives or more with better protection than RAID 5 without losing quite as much space.

rhys
rhys

RAID 10 and RAID 6 both require a minimum of 4 disks. When RAID 10 loses a disk you still have your RAID 0 stripe spanning your two (or more) mirrors albeit one mirror has a failed drive. The next drive you lose will cause total failure if the drive that fails is on the mirror that already has a failed drive. If it is on the other mirror no effect. For data protection this is obviously worse than RAID 6 where any two drives can fail and you have not lost data. A RAID 10 array with say 12 disks stores the equivalent of an 8 disk RAID 6 array. Loss of two disks can fail the whole array but the odds are now reduced to a 9% chance (1/11) the next failure will be on the one drive that will cause total failure. Potentially this array could lose six disks with no data loss. However, no matter the number of disks in the array two failures in the same mirror still result in total failure(unless you mirror to multiple disks which really puts cost up). RAID 10 also has the advantage of minimal impact on disk speed during the rebuild. RAID 6 gets less resilient as the array grows. This can be combated by keeping arrays smaller and using RAID 60, or in the extreme there is the suggestion above for use of RAID 16. If you need 100% uptime RAID 16 could be an option. The rebuild time is minimised by the mirrors, and the susceptibility to total failure is minimised by the overlying RAID6 When considering protection against failure during rebuild RAID 10 has the advantage that you are only rebuilding a single drive mirror.

b4real
b4real

On current drives. That is one of the big stinks about Amazon not using hardware RAID in some of the cloud solutions.

b4real
b4real

Sure. I get stuck in the overhead tradeoff (cost) for the extra protection. You've given me a good idea, however.

bloubert
bloubert

I didn't see a separate discussion thread, indicating the author edited the original article, until after I posted.

rhys
rhys

Disclaimer: I have never endured a production RAID rebuild. The RAID 6 diagram in the article appears flawed, it is only accurate for the top three blocks on each of the four disks where it shows dual parity bits. On the bottom two blocks the diagram appears to be RAID 5. Ignoring the bottom two blocks, if we remove any disk (due to failure) we see that there must be parity calculation for every block on the disk being rebuilt (assuming the two parity algorithms are independant). Your RAID controller has to do the block by block parity calculation on the relevant parity algorithm before writing. To me this seems at least as much effort as rebuilding RAID 5 where every block requires a parity calc, but always on the same parity algorithm. If during your rebuild you have normal read write going on, and we compare arrays with the same usable space RAID 6 should require less effort than RAID 5 because the RAID 6 will have some bits that require no parity calc on reads. The greater the number of disks in the array the greater the number of blocks requiring no calc. I am junior but I would suggest online storage and Dbase use RAID 10 (or 100 etc). High resilience and performance with lower rebuild times but high cost. Offest high cost by moving as much data as you don't need daily to nearline storage (cheap disk with lower performance) wher either RAID 5 or restore from backup is appropriate. But I deal with a small environment and it may be wildly different to your environment.

ebouza
ebouza

I believe the way to go nowadays is with RAID 1+0. I submit that for important data, RAID 1+0 should be your first choice. It offers good performance not as good as RAID 5 on reads but it is much more resilient than RAID 5 when you do have a disk failure.

maj37
maj37

If you have a large array with lots of data scattered out over many disks, and sufficient CACHE, is there really any significant performance improvement in RAID 1 +0 over RAID 1, and if so it is really worth the extra complexity? maj

DomBenson
DomBenson

For server-local storage, when the cost difference between a 6-disk RAID6 and an 8 disk RAID10 is just the two extra drives, I agree, RAID10 is a very good choice. As mentioned above, with large numbers of drives, there are substantial increases in cost in provisioning and running the additional enclosures to accomodate the extra drives required. (Imagine a 144 drive 12x12 RAID 60 - it would take a 240 drive RAID10 to match the capacity). RAID 10 is still potentially vulnerable to a double-disk failure - if the second disk is the other half of the degraded subvolume. This isn't as unlikely as you might expect (that is, that assuming a second disk fails that it will be that drive), as the remaining disk in the set is under significantly higher load. RAID 6/60 mitigate this by tolerating *any* second failure, and offer substantial increases in capacity and read performance owing to the wider stripe. Provided you keep each RAID6 set down to a relatively modest number of disks (depending on their performance/capacity ratio), recoveries can take an acceptably short time, and (with a suitably sized controller cache), the write overhead is modest.

turbinepilot
turbinepilot

True, in small scale scenarios the price differential between RAID 1+0 and RAID 5 or RAID 6 isn't that great. However, in larger data centers, the cost to implement RAID 1+0 universally can exceed management's tolerance for expense. Recently we installed a storage array which cost just a bit over $400K exclusive of the disk drives. After dedicating a few slots for hot spares our array had 570 slots that could be used for business storage purposes. Based upon our growth projections we would have exhausted the array's empty disk slots in four years had we implemented RAID 1+0. In contrast, using by implementing RAID 5 we expect the frame to meet our storage needs for at least 6 years. True, we pay a penalty in write performance by using RAID 5 versus RAID 1+0 but the arrays' cache effectively insulates the applications from that penalty. In day-to-day operational monitoring we rarely see instances where cache is exhausted and the application must drop to disk speed.

jhoward
jhoward

Our needs are not so much giant storage requirements but more data integrity and performance. You can argue that the up front costs (for drives) are double that of a RAID5 configuration but after the first or second drive failure it easily pays for itself in time and performance. We have a few very large and write intensive databases working off of RAID1+0 arrays and I know they have saved me a few full days of work at least when drives have failed.

jjcanaday
jjcanaday

That does clear up their position a lot.

jeslurkin
jeslurkin

Okay,... I _have_ been deprived of most movies since the late 70s. :) You are right,... less 'puteing, more tee wee. :)

RF7000
RF7000

I just had a flashback of austin powers in goldmember. you need to get off the computer and watch more tv :)

jeslurkin
jeslurkin

...to be _very_ 'sensitive' to the RAID choices of others,... and more Danish than Dutch. So, you're saying they're OK? :)

b4real
b4real

There is a formal movement against RAID5. The passion, you have to love it!

munsch
munsch

My browser hadn't loaded the whole page yet - this thread got LONG! and apparently several people have already invoked BAARF. Good. heh.

gavin142
gavin142

Adding the stripe allows you more spindles to read the data from. More spindles means you're less restricted by a single disk's data transfer speeds. I'd say the two biggest factors which control the amount of performance boost you'd realize would have to be your controller(s) and the drives themselves. If your controllers are sub-par, they alone can kill any potential performance increases you would normally realize. Conversely, an array of 15K SAS drives will certainly outperform the same number of SATA disks (with the possible exception of SSD's with SATA2/3 interfaces)

rhys
rhys

Thanks for taking the time to post. That is really useful. Once I model our environment I can be surer of advice I get.

b4real
b4real

Many organizations just can't hack the cost of the overhead, especially on SAS storage. SATA, maybe.

MikeGall
MikeGall

Hi, true what you say about mirroring speeds. But RAID 1 +0 still leaves your data vulnerable while the rebuild onto the hot spare is in progress. Sometimes you get oddities in the RAID system and have to reboot it to get it to recognize things properly (at least that has been my experience with a few vendors products). Often that is enough to push another disk over the edge.

drn
drn

Yes, RAID 1+0 is susceptible to a having the second disk fail in a mirrored pair. But that's why you keep hot spares in the RAID box, and why you bought a RAID box that is smart enough to automatically break the mirror on the bad drive and resync the mirror using a hot spare. The syncing of a hot spare in RAID 1+0 is magnitudes faster than a RAID 5/6 recovery.

DomBenson
DomBenson

Using RAID 60 over RAID 6 (IMHO) is not just about raw resilience, but recoverability, write performance and scale. I'll try to roughly quantify these. We will assume that disk failures are independent (so environmental events that would plausibly destroy the entire array are out of scope, and provided for by a genuinely independent backup), and that drive failure during rebuild is uniform over data read (ignoring the contribution from the MTTF for a random failure, which also increases risk with wide RAID6 sets). 1) Recoverability: By halving (or more) the width of each RAID set, the number of bits you have to read to rebuild a drive is much reduced, hence the risk of further drive failures during rebuild is reduced. For example, consider 48 2TB drives, at a UER of 1E-15 (e.g. WD RE4 or Seagate Constellation ES). With 6 8-disk RAID6 sets, the array provides a total of 6*(8-2)*2 = 72TB. If a single drive fails, then all 12TB of data in the RAID6 set it was within must be read to rebuild it. This is 12*8*1024*1024*1024*1024 bits (about 1E14). The probability of two (or more) unrecoverable errors is thus: 1 - (1-1E-15)^(1E14) - (1E14 C 1)*(1-1E-15)^(1E14-1)*1E-15 as it is a binomial. This is 1 - (1-1E-15)^(1E14) -(1-1E-15)^(1E14-1)/10 Approximating (as 1/(1-1E-15)*10 is very close to 0.1): 1 - 1.1*((1-1E-15)^(1E14)) We can expand this as a Taylor series into: 1 - 1.1 * (1 - 0.1 + (1E14 C 2)*1E-30 - ...) (ignoring further terms, which are rapidly extremely small - each at most 1/10 the previous, and with alternating signs) 1 - 1.1 * (0.9 + (1E14! / ((1E14-2)!*2!))*1E-30) Approximate (1E14*(1E14-1)) as 1E28 1 - 1.1 * (0.9 + 1E28/2*1E-30) = 1 - 1.1 * (0.9 + 0.005) = 0.0045 hence a 0.45% chance of failure during rebuild. If we do the same for a single wide RAID6 volume (using only 38 disks, to provide the same capacity): The number of bits becomes 72*8*(1024)^4 - about 6E14 Substituting this in: This is 1 - (1-1E-15)^(6E14) -(1-1E-15)^(6E14-1)/10 1 - 1.6 * (1 - 0.6 + (6E14 C 2)*1E-30 - ...) ~= 1 - 1.6 * (0.4 + (6E14! / ((6E14-2)!*2!))*1E-30) ~= 1 - 1.6 * (0.4 + 3.6E29/2*1E-30) = 0.072 hence a 7.2% chance of failure during rebuild. These are only approximate, as there are some not entirely negligible terms that we have truncated, but you should be able to satisfy yourself that the error is small relative to the difference. 2) Write performance: Parity RAID in general suffers from performance overheads when writing data less than the total stripe width. Yet reads see substantial improvements from being able to satisfy a request from a handful of physical disks. The wider the stripe width, the worse this becomes. Also, the fewer the number of partial stripes that can be held in the controller cache to speed parity calculation. By splitting the array into smaller RAID6 sections, each write (hence parity calculation) depends on fewer drives, and involves less unwanted data. This is a bit vague - your mileage will vary - but the distinction is non-trivial in highly transactional workloads, as each sub-volume is essentially independent. 3) Scale: It is perfectly feasible to implement the RAID0 at a higher level than the RAID6. In this case, multiple hardware controllers can be harnessed for their individual bandwidth and calculation capacity (and cache), but all contribute towards a single storage pool. Some controllers can, I think, interact along the PCIe bus to achieve much the same thing. SAS expanders (and SATA-II support for expanders) make this somewhat less useful, as a single controller can potentially connect to many more disks than it has native ports for, but the limitations on total throughput and parity engine/cache capacity still apply.

MikeGall
MikeGall

I had a couple times in a one year span where RAID 6 saved my bacon. The reason: a second drive failure on the same disk array before the replacement drive arrived and/or fully rebuilt. It happens. Drives fail for a reason and it can be that that area of the data centre got more vibration, heat or whatever than other areas and you have several disk failures localized in space and time.

rhys
rhys

Can anyone point me to resources or data on failure rates and risk calculation for disk or array failures? I can see that RAID 60 is a more resilient approach to presenting a large contiguous storage space than a single RAID 6 array. I want to be able to quantify this to my boss who has an accountancy background, and is not IT technical. Has anyone done real world calcs to work out crossover points for risk appetite and cost? Even pointers to real world risk calcs would be a great start.

Editor's Picks