Discussion on:
View:
Show:
I would be interested in learning why rebuild times should be faster with Raid 6. It doesn't seem obvious, as the rebuild process still has to read or write the same number of drives with raid 5 or raid 6 - doesn't it?
With the price of disks being so, so cheap, there is no longer makes business sense to take the hit of slower writes and long recoveries of RAID 5 or 6. Nowadays, I never see a business case for other than RAID 1+0 (sometimes incorrectly called RAID 10).
Our needs are not so much giant storage requirements but more data integrity and performance.
You can argue that the up front costs (for drives) are double that of a RAID5 configuration but after the first or second drive failure it easily pays for itself in time and performance.
We have a few very large and write intensive databases working off of RAID1+0 arrays and I know they have saved me a few full days of work at least when drives have failed.
You can argue that the up front costs (for drives) are double that of a RAID5 configuration but after the first or second drive failure it easily pays for itself in time and performance.
We have a few very large and write intensive databases working off of RAID1+0 arrays and I know they have saved me a few full days of work at least when drives have failed.
True, in small scale scenarios the price differential between RAID 1+0 and RAID 5 or RAID 6 isn't that great. However, in larger data centers, the cost to implement RAID 1+0 universally can exceed management's tolerance for expense.
Recently we installed a storage array which cost just a bit over $400K exclusive of the disk drives. After dedicating a few slots for hot spares our array had 570 slots that could be used for business storage purposes.
Based upon our growth projections we would have exhausted the array's empty disk slots in four years had we implemented RAID 1+0. In contrast, using by implementing RAID 5 we expect the frame to meet our storage needs for at least 6 years.
True, we pay a penalty in write performance by using RAID 5 versus RAID 1+0 but the arrays' cache effectively insulates the applications from that penalty. In day-to-day operational monitoring we rarely see instances where cache is exhausted and the application must drop to disk speed.
Recently we installed a storage array which cost just a bit over $400K exclusive of the disk drives. After dedicating a few slots for hot spares our array had 570 slots that could be used for business storage purposes.
Based upon our growth projections we would have exhausted the array's empty disk slots in four years had we implemented RAID 1+0. In contrast, using by implementing RAID 5 we expect the frame to meet our storage needs for at least 6 years.
True, we pay a penalty in write performance by using RAID 5 versus RAID 1+0 but the arrays' cache effectively insulates the applications from that penalty. In day-to-day operational monitoring we rarely see instances where cache is exhausted and the application must drop to disk speed.
For server-local storage, when the cost difference between a 6-disk RAID6 and an 8 disk RAID10 is just the two extra drives, I agree, RAID10 is a very good choice. As mentioned above, with large numbers of drives, there are substantial increases in cost in provisioning and running the additional enclosures to accomodate the extra drives required. (Imagine a 144 drive 12x12 RAID 60 - it would take a 240 drive RAID10 to match the capacity).
RAID 10 is still potentially vulnerable to a double-disk failure - if the second disk is the other half of the degraded subvolume. This isn't as unlikely as you might expect (that is, that assuming a second disk fails that it will be that drive), as the remaining disk in the set is under significantly higher load.
RAID 6/60 mitigate this by tolerating *any* second failure, and offer substantial increases in capacity and read performance owing to the wider stripe. Provided you keep each RAID6 set down to a relatively modest number of disks (depending on their performance/capacity ratio), recoveries can take an acceptably short time, and (with a suitably sized controller cache), the write overhead is modest.
RAID 10 is still potentially vulnerable to a double-disk failure - if the second disk is the other half of the degraded subvolume. This isn't as unlikely as you might expect (that is, that assuming a second disk fails that it will be that drive), as the remaining disk in the set is under significantly higher load.
RAID 6/60 mitigate this by tolerating *any* second failure, and offer substantial increases in capacity and read performance owing to the wider stripe. Provided you keep each RAID6 set down to a relatively modest number of disks (depending on their performance/capacity ratio), recoveries can take an acceptably short time, and (with a suitably sized controller cache), the write overhead is modest.
Can anyone point me to resources or data on failure rates and risk calculation for disk or array failures?
I can see that RAID 60 is a more resilient approach to presenting a large contiguous storage space than a single RAID 6 array.
I want to be able to quantify this to my boss who has an accountancy background, and is not IT technical. Has anyone done real world calcs to work out crossover points for risk appetite and cost? Even pointers to real world risk calcs would be a great start.
I can see that RAID 60 is a more resilient approach to presenting a large contiguous storage space than a single RAID 6 array.
I want to be able to quantify this to my boss who has an accountancy background, and is not IT technical. Has anyone done real world calcs to work out crossover points for risk appetite and cost? Even pointers to real world risk calcs would be a great start.
Using RAID 60 over RAID 6 (IMHO) is not just about raw resilience, but recoverability, write performance and scale.
I'll try to roughly quantify these. We will assume that disk failures are independent (so environmental events that would plausibly destroy the entire array are out of scope, and provided for by a genuinely independent backup), and that drive failure during rebuild is uniform over data read (ignoring the contribution from the MTTF for a random failure, which also increases risk with wide RAID6 sets).
1) Recoverability: By halving (or more) the width of each RAID set, the number of bits you have to read to rebuild a drive is much reduced, hence the risk of further drive failures during rebuild is reduced. For example, consider 48 2TB drives, at a UER of 1E-15 (e.g. WD RE4 or Seagate Constellation ES). With 6 8-disk RAID6 sets, the array provides a total of 6*(8-2)*2 = 72TB. If a single drive fails, then all 12TB of data in the RAID6 set it was within must be read to rebuild it. This is 12*8*1024*1024*1024*1024 bits (about 1E14). The probability of two (or more) unrecoverable errors is thus:
1 - (1-1E-15)^(1E14) - (1E14 C 1)*(1-1E-15)^(1E14-1)*1E-15
as it is a binomial.
This is 1 - (1-1E-15)^(1E14) -(1-1E-15)^(1E14-1)/10
Approximating (as 1/(1-1E-15)*10 is very close to 0.1):
1 - 1.1*((1-1E-15)^(1E14))
We can expand this as a Taylor series into:
1 - 1.1 * (1 - 0.1 + (1E14 C 2)*1E-30 - ...)
(ignoring further terms, which are rapidly extremely small - each at most 1/10 the previous, and with alternating signs)
1 - 1.1 * (0.9 + (1E14! / ((1E14-2)!*2!))*1E-30)
Approximate (1E14*(1E14-1)) as 1E28
1 - 1.1 * (0.9 + 1E28/2*1E-30)
= 1 - 1.1 * (0.9 + 0.005)
= 0.0045
hence a 0.45% chance of failure during rebuild.
If we do the same for a single wide RAID6 volume (using only 38 disks, to provide the same capacity):
The number of bits becomes 72*8*(1024)^4 - about 6E14
Substituting this in:
This is 1 - (1-1E-15)^(6E14) -(1-1E-15)^(6E14-1)/10
1 - 1.6 * (1 - 0.6 + (6E14 C 2)*1E-30 - ...)
~= 1 - 1.6 * (0.4 + (6E14! / ((6E14-2)!*2!))*1E-30)
~= 1 - 1.6 * (0.4 + 3.6E29/2*1E-30)
= 0.072
hence a 7.2% chance of failure during rebuild.
These are only approximate, as there are some not entirely negligible terms that we have truncated, but you should be able to satisfy yourself that the error is small relative to the difference.
2) Write performance:
Parity RAID in general suffers from performance overheads when writing data less than the total stripe width. Yet reads see substantial improvements from being able to satisfy a request from a handful of physical disks.
The wider the stripe width, the worse this becomes. Also, the fewer the number of partial stripes that can be held in the controller cache to speed parity calculation. By splitting the array into smaller RAID6 sections,
each write (hence parity calculation) depends on fewer drives, and involves less unwanted data. This is a bit vague - your mileage will vary - but the distinction is non-trivial in highly transactional workloads, as each sub-volume is essentially independent.
3) Scale:
It is perfectly feasible to implement the RAID0 at a higher level than the RAID6. In this case, multiple hardware controllers can be harnessed for their individual bandwidth and calculation capacity (and cache), but all contribute towards a single storage pool.
Some controllers can, I think, interact along the PCIe bus to achieve much the same thing.
SAS expanders (and SATA-II support for expanders) make this somewhat less useful, as a single controller can potentially connect to many more disks than it has native ports for, but the limitations on total throughput and parity engine/cache capacity still apply.
I'll try to roughly quantify these. We will assume that disk failures are independent (so environmental events that would plausibly destroy the entire array are out of scope, and provided for by a genuinely independent backup), and that drive failure during rebuild is uniform over data read (ignoring the contribution from the MTTF for a random failure, which also increases risk with wide RAID6 sets).
1) Recoverability: By halving (or more) the width of each RAID set, the number of bits you have to read to rebuild a drive is much reduced, hence the risk of further drive failures during rebuild is reduced. For example, consider 48 2TB drives, at a UER of 1E-15 (e.g. WD RE4 or Seagate Constellation ES). With 6 8-disk RAID6 sets, the array provides a total of 6*(8-2)*2 = 72TB. If a single drive fails, then all 12TB of data in the RAID6 set it was within must be read to rebuild it. This is 12*8*1024*1024*1024*1024 bits (about 1E14). The probability of two (or more) unrecoverable errors is thus:
1 - (1-1E-15)^(1E14) - (1E14 C 1)*(1-1E-15)^(1E14-1)*1E-15
as it is a binomial.
This is 1 - (1-1E-15)^(1E14) -(1-1E-15)^(1E14-1)/10
Approximating (as 1/(1-1E-15)*10 is very close to 0.1):
1 - 1.1*((1-1E-15)^(1E14))
We can expand this as a Taylor series into:
1 - 1.1 * (1 - 0.1 + (1E14 C 2)*1E-30 - ...)
(ignoring further terms, which are rapidly extremely small - each at most 1/10 the previous, and with alternating signs)
1 - 1.1 * (0.9 + (1E14! / ((1E14-2)!*2!))*1E-30)
Approximate (1E14*(1E14-1)) as 1E28
1 - 1.1 * (0.9 + 1E28/2*1E-30)
= 1 - 1.1 * (0.9 + 0.005)
= 0.0045
hence a 0.45% chance of failure during rebuild.
If we do the same for a single wide RAID6 volume (using only 38 disks, to provide the same capacity):
The number of bits becomes 72*8*(1024)^4 - about 6E14
Substituting this in:
This is 1 - (1-1E-15)^(6E14) -(1-1E-15)^(6E14-1)/10
1 - 1.6 * (1 - 0.6 + (6E14 C 2)*1E-30 - ...)
~= 1 - 1.6 * (0.4 + (6E14! / ((6E14-2)!*2!))*1E-30)
~= 1 - 1.6 * (0.4 + 3.6E29/2*1E-30)
= 0.072
hence a 7.2% chance of failure during rebuild.
These are only approximate, as there are some not entirely negligible terms that we have truncated, but you should be able to satisfy yourself that the error is small relative to the difference.
2) Write performance:
Parity RAID in general suffers from performance overheads when writing data less than the total stripe width. Yet reads see substantial improvements from being able to satisfy a request from a handful of physical disks.
The wider the stripe width, the worse this becomes. Also, the fewer the number of partial stripes that can be held in the controller cache to speed parity calculation. By splitting the array into smaller RAID6 sections,
each write (hence parity calculation) depends on fewer drives, and involves less unwanted data. This is a bit vague - your mileage will vary - but the distinction is non-trivial in highly transactional workloads, as each sub-volume is essentially independent.
3) Scale:
It is perfectly feasible to implement the RAID0 at a higher level than the RAID6. In this case, multiple hardware controllers can be harnessed for their individual bandwidth and calculation capacity (and cache), but all contribute towards a single storage pool.
Some controllers can, I think, interact along the PCIe bus to achieve much the same thing.
SAS expanders (and SATA-II support for expanders) make this somewhat less useful, as a single controller can potentially connect to many more disks than it has native ports for, but the limitations on total throughput and parity engine/cache capacity still apply.
Thanks for taking the time to post. That is really useful.
Once I model our environment I can be surer of advice I get.
Once I model our environment I can be surer of advice I get.
I had a couple times in a one year span where RAID 6 saved my bacon. The reason: a second drive failure on the same disk array before the replacement drive arrived and/or fully rebuilt. It happens. Drives fail for a reason and it can be that that area of the data centre got more vibration, heat or whatever than other areas and you have several disk failures localized in space and time.
Yes, RAID 1+0 is susceptible to a having the second disk fail in a mirrored pair. But that's why you keep hot spares in the RAID box, and why you bought a RAID box that is smart enough to automatically break the mirror on the bad drive and resync the mirror using a hot spare. The syncing of a hot spare in RAID 1+0 is magnitudes faster than a RAID 5/6 recovery.
Hi, true what you say about mirroring speeds. But RAID 1 +0 still leaves your data vulnerable while the rebuild onto the hot spare is in progress. Sometimes you get oddities in the RAID system and have to reboot it to get it to recognize things properly (at least that has been my experience with a few vendors products). Often that is enough to push another disk over the edge.
Many organizations just can't hack the cost of the overhead, especially on SAS storage. SATA, maybe.
If you have a large array with lots of data scattered out over many disks, and sufficient CACHE, is there really any significant performance improvement in RAID 1 +0 over RAID 1, and if so it is really worth the extra complexity?
maj
maj
Adding the stripe allows you more spindles to read the data from. More spindles means you're less restricted by a single disk's data transfer speeds. I'd say the two biggest factors which control the amount of performance boost you'd realize would have to be your controller(s) and the drives themselves.
If your controllers are sub-par, they alone can kill any potential performance increases you would normally realize. Conversely, an array of 15K SAS drives will certainly outperform the same number of SATA disks (with the possible exception of SSD's with SATA2/3 interfaces)
If your controllers are sub-par, they alone can kill any potential performance increases you would normally realize. Conversely, an array of 15K SAS drives will certainly outperform the same number of SATA disks (with the possible exception of SSD's with SATA2/3 interfaces)
I believe the way to go nowadays is with RAID 1+0. I submit that for important data, RAID 1+0 should be your first choice. It offers good performance not as good as RAID 5 on reads but it is much more resilient than RAID 5 when you do have a disk failure.
I would think its faster because you have 2 times as many parity blocks so reads on the parity arent isolated to single blocks/disks. If one block is busy the same parity is available elsewhere and the reader doesnt have to wait for 1 disk. Anyone agree?? =)
Disclaimer: I have never endured a production RAID rebuild.
The RAID 6 diagram in the article appears flawed, it is only accurate for the top three blocks on each of the four disks where it shows dual parity bits. On the bottom two blocks the diagram appears to be RAID 5.
Ignoring the bottom two blocks, if we remove any disk (due to failure) we see that there must be parity calculation for every block on the disk being rebuilt (assuming the two parity algorithms are independant).
Your RAID controller has to do the block by block parity calculation on the relevant parity algorithm before writing. To me this seems at least as much effort as rebuilding RAID 5 where every block requires a parity calc, but always on the same parity algorithm.
If during your rebuild you have normal read write going on, and we compare arrays with the same usable space RAID 6 should require less effort than RAID 5 because the RAID 6 will have some bits that require no parity calc on reads. The greater the number of disks in the array the greater the number of blocks requiring no calc.
I am junior but I would suggest online storage and Dbase use RAID 10 (or 100 etc). High resilience and performance with lower rebuild times but high cost.
Offest high cost by moving as much data as you don't need daily to nearline storage (cheap disk with lower performance) wher either RAID 5 or restore from backup is appropriate. But I deal with a small environment and it may be wildly different to your environment.
The RAID 6 diagram in the article appears flawed, it is only accurate for the top three blocks on each of the four disks where it shows dual parity bits. On the bottom two blocks the diagram appears to be RAID 5.
Ignoring the bottom two blocks, if we remove any disk (due to failure) we see that there must be parity calculation for every block on the disk being rebuilt (assuming the two parity algorithms are independant).
Your RAID controller has to do the block by block parity calculation on the relevant parity algorithm before writing. To me this seems at least as much effort as rebuilding RAID 5 where every block requires a parity calc, but always on the same parity algorithm.
If during your rebuild you have normal read write going on, and we compare arrays with the same usable space RAID 6 should require less effort than RAID 5 because the RAID 6 will have some bits that require no parity calc on reads. The greater the number of disks in the array the greater the number of blocks requiring no calc.
I am junior but I would suggest online storage and Dbase use RAID 10 (or 100 etc). High resilience and performance with lower rebuild times but high cost.
Offest high cost by moving as much data as you don't need daily to nearline storage (cheap disk with lower performance) wher either RAID 5 or restore from backup is appropriate. But I deal with a small environment and it may be wildly different to your environment.
"the rebuild times being less impactful on disk with RAID 6 even though they take longer"
I didn't see a separate discussion thread, indicating the author edited the original article, until after I posted.
Hard disk capacities are rapidly increasing, and so are RAID rebuild times. The problem is particularly apparent with SATA drive technology, but as 300 GB Fibre Channel drives become more prevalent, even Fibre Channel-based arrays are suffering from long array rebuild times. Many storage administrators are using fewer and fewer drives in an array group. Although using larger capacity drives may make economic sense, fewer drives equate to longer rebuild times. RAID 5 array with five 500 GB SATA drives took approximately 24 hours to rebuild. With nine 500 GB drives and almost the exact same data set, it took fewer than eight hours. Below are some tips to help you control RAID rebuild times.
Try to avoid RAID5 and RAID6 like the plague. Use RAID10 (or RAID 1+0) instead. S.A.M.E. Stripe and Mirror Everything. It will provide the best performance and the best reliability. The extra cost of RAID10 is not enough to warrant the reduced reliability and increased recovery time of RAID5 or RAID6.
Sure. I get stuck in the overhead tradeoff (cost) for the extra protection.
You've given me a good idea, however.
You've given me a good idea, however.
...if we didn't mention BAARF in this discussion: The Battle Against Any Raid Five.
http://www.miracleas.com/BAARF/BAARF2.html
http://www.miracleas.com/BAARF/BAARF2.html
My browser hadn't loaded the whole page yet - this thread got LONG! and apparently several people have already invoked BAARF. Good. heh.
There is a formal movement against RAID5.
The passion, you have to love it!
The passion, you have to love it!
http://www.miracleas.com/BAARF/BAARF2.html
if there are 2 things i cannot tolerate, it's people insensitive to other people's raid choice, and the Dutch.
if there are 2 things i cannot tolerate, it's people insensitive to other people's raid choice, and the Dutch.
...to be _very_ 'sensitive' to the RAID choices of others,... and more Danish than Dutch. So, you're saying they're OK?
I just had a flashback of austin powers in goldmember. you need to get off the computer and watch more tv
Okay,... I _have_ been deprived of most movies since the late 70s. 
You are right,... less 'puteing, more tee wee.
You are right,... less 'puteing, more tee wee.
and you dont know ,......oh some of blindness will be useful in that situation as it like the big and horrible start
If zero data loss is the only goal, then RAID 6 is more efficient than RAID 1, which has 50% overhead. But where 100% uptime is required, you still can't beat RAID 1. Perhaps a RAID 1+6 solution, while expensive, is the best for large storage systems.
If I have it right then you suggest a bunch of disks each in a RAID 1 mirror, then a RAID 6 spanning the mirrored disks.
I can see there is greater protection than RAID 10, or 15, but what is the actual difference in risk between RAID 15 and RAID 16?
For total failure:
RAID 15 requires a minimum of four failures in less than the time required for two consecutive mirror rebuilds.
RAID 16 requires a minimum of six failures in less than the time required for three consecutive mirror rebuilds.
I haven't seen a RAID 1 rebuild happen in production, I don't have stats on the failure rate fo mirror rebuilds. I have never heard of a RAID 1 rebuild failing, but I am junior.
RAID 16 requires a minimum of 8 disks.
Probability of losing both disks from one of the mirrors from the overlying RAID 6 are 1/7th of the probability of a mirror rebuild failure, or lower.
Now you have same resilience as RAID 15 (or near enough).
This is the advantage of RAID 16 over RAID 15.
Given the conditional probability for total failure of RAID 15 what risk are we looking at here?
Is the extra layer fo RAID 16 really adding protection that you need? High Availability is HA, but what are you aiming for that you need RAID 16 rather than RAID 15?
Can anyone give an expected failure rate for RAID 15 or RAID 16?
I can see there is greater protection than RAID 10, or 15, but what is the actual difference in risk between RAID 15 and RAID 16?
For total failure:
RAID 15 requires a minimum of four failures in less than the time required for two consecutive mirror rebuilds.
RAID 16 requires a minimum of six failures in less than the time required for three consecutive mirror rebuilds.
I haven't seen a RAID 1 rebuild happen in production, I don't have stats on the failure rate fo mirror rebuilds. I have never heard of a RAID 1 rebuild failing, but I am junior.
RAID 16 requires a minimum of 8 disks.
Probability of losing both disks from one of the mirrors from the overlying RAID 6 are 1/7th of the probability of a mirror rebuild failure, or lower.
Now you have same resilience as RAID 15 (or near enough).
This is the advantage of RAID 16 over RAID 15.
Given the conditional probability for total failure of RAID 15 what risk are we looking at here?
Is the extra layer fo RAID 16 really adding protection that you need? High Availability is HA, but what are you aiming for that you need RAID 16 rather than RAID 15?
Can anyone give an expected failure rate for RAID 15 or RAID 16?
No redundancy? I don't see them advocating for any type of data safeguarding. Do they think nightly backups is enough?
Maybe I didn't take enough time (and I didn't download any of the pdf docs) but, all I can see is them advocating AGAINST RAID -3 -4 -5. Is RAID -0 or -1 OK?
Maybe I didn't take enough time (and I didn't download any of the pdf docs) but, all I can see is them advocating AGAINST RAID -3 -4 -5. Is RAID -0 or -1 OK?
Thanks for pointing the way.
Raid Five & Six are compromises that are not needed.
All that money spent on hardware to achieve reliability that induces failure points on its own.
Raid Five & Six are compromises that are not needed.
All that money spent on hardware to achieve reliability that induces failure points on its own.
Interesting and informative, but his treatise ignores the CRC built into the drives that are designed to prevent the returning of garbage by the drive. This CRC is a drive level function not a RAID controller function so it is done independent of any array hardware.
maj
maj
On current drives. That is one of the big stinks about Amazon not using hardware RAID in some of the cloud solutions.
I agree that raid 5 is dangerous. often times when the controller is hammering at the drives during a rebuild, it notices that another drive has failed.
Raid 10 can tolerate up to 2 drive failure. that means that not any 2 drive can fail, only 2 drives as long as certain conditions are meant. mainly as long as the drives are both in the same mirror, data will not be lost.
Raid 6 can handle ANY 2 drive failures. it doesn't matter which 2. Add in to this background consistency checks and raid 6 offers a very high reliability.
Raid 6 also offers a higher performance by using all drives for reading and writing, not just half. as far as the performance drops, this is usually a constraint on the processing in the controller while raid 10 is a constraint of the hard drive transfer speeds. A good controller in a raid 6 outperforms a raid 10 if all others factors remain the same. I don't pay much attention to online benchmarks. I do my own testing.
Raid 10 can tolerate up to 2 drive failure. that means that not any 2 drive can fail, only 2 drives as long as certain conditions are meant. mainly as long as the drives are both in the same mirror, data will not be lost.
Raid 6 can handle ANY 2 drive failures. it doesn't matter which 2. Add in to this background consistency checks and raid 6 offers a very high reliability.
Raid 6 also offers a higher performance by using all drives for reading and writing, not just half. as far as the performance drops, this is usually a constraint on the processing in the controller while raid 10 is a constraint of the hard drive transfer speeds. A good controller in a raid 6 outperforms a raid 10 if all others factors remain the same. I don't pay much attention to online benchmarks. I do my own testing.
RAID 10 and RAID 6 both require a minimum of 4 disks.
When RAID 10 loses a disk you still have your RAID 0 stripe spanning your two (or more) mirrors albeit one mirror has a failed drive. The next drive you lose will cause total failure if the drive that fails is on the mirror that already has a failed drive. If it is on the other mirror no effect.
For data protection this is obviously worse than RAID 6 where any two drives can fail and you have not lost data.
A RAID 10 array with say 12 disks stores the equivalent of an 8 disk RAID 6 array. Loss of two disks can fail the whole array but the odds are now reduced to a 9% chance (1/11) the next failure will be on the one drive that will cause total failure.
Potentially this array could lose six disks with no data loss.
However, no matter the number of disks in the array two failures in the same mirror still result in total failure(unless you mirror to multiple disks which really puts cost up).
RAID 10 also has the advantage of minimal impact on disk speed during the rebuild.
RAID 6 gets less resilient as the array grows. This can be combated by keeping arrays smaller and using RAID 60, or in the extreme there is the suggestion above for use of RAID 16. If you need 100% uptime RAID 16 could be an option. The rebuild time is minimised by the mirrors, and the susceptibility to total failure is minimised by the overlying RAID6
When considering protection against failure during rebuild RAID 10 has the advantage that you are only rebuilding a single drive mirror.
When RAID 10 loses a disk you still have your RAID 0 stripe spanning your two (or more) mirrors albeit one mirror has a failed drive. The next drive you lose will cause total failure if the drive that fails is on the mirror that already has a failed drive. If it is on the other mirror no effect.
For data protection this is obviously worse than RAID 6 where any two drives can fail and you have not lost data.
A RAID 10 array with say 12 disks stores the equivalent of an 8 disk RAID 6 array. Loss of two disks can fail the whole array but the odds are now reduced to a 9% chance (1/11) the next failure will be on the one drive that will cause total failure.
Potentially this array could lose six disks with no data loss.
However, no matter the number of disks in the array two failures in the same mirror still result in total failure(unless you mirror to multiple disks which really puts cost up).
RAID 10 also has the advantage of minimal impact on disk speed during the rebuild.
RAID 6 gets less resilient as the array grows. This can be combated by keeping arrays smaller and using RAID 60, or in the extreme there is the suggestion above for use of RAID 16. If you need 100% uptime RAID 16 could be an option. The rebuild time is minimised by the mirrors, and the susceptibility to total failure is minimised by the overlying RAID6
When considering protection against failure during rebuild RAID 10 has the advantage that you are only rebuilding a single drive mirror.
RAID 6 rebuild times are slower than RAID 5. This article is incorrect.
You lightly touched on one major aspect of today's drives and inherent problems within: error correction on perfectly good drives.
Simply put, there's a massive amount of overhead on today's drives given the manufacturer's acceptable hardware tolerance for error correction. Read your drive specs, I think you'll be surprised.
The rule of thumb is the larger drive space you have, the larger allotted error correction time; which = cpu peformance overhead.
Life's great when we can store the equivalent of the Library of Congress on a few arrays. Nonetheless, one cannot simply ignore the data errors that exist on perfectly functioning drives. IT folks that don't factor in the leakage of errors are doomed to long hours of chasing the blame game from the upper echelon screaming "I thought this system was going to be reliable!!!"
Well Sir, it is, sort of.
Factor in aging drive spindle noise, worn internal mechanical components of the drive- and you have all the makings of a king size headache.
The bottom line is this: in any disk array, there are trade-offs in performance, integrity, and finally, cost.
Therefore, the reality of which array and which architecture to select must come down to two things: first, data integrity. Just how good is it? Secondly, cost. You can't ignore the CFO and their insistence to remain within a sane budget.
Anything less than that and you will fall into that age old trap of 'head in sand' viewpoint, and it WILL come back to bite you.
Simply put, there's a massive amount of overhead on today's drives given the manufacturer's acceptable hardware tolerance for error correction. Read your drive specs, I think you'll be surprised.
The rule of thumb is the larger drive space you have, the larger allotted error correction time; which = cpu peformance overhead.
Life's great when we can store the equivalent of the Library of Congress on a few arrays. Nonetheless, one cannot simply ignore the data errors that exist on perfectly functioning drives. IT folks that don't factor in the leakage of errors are doomed to long hours of chasing the blame game from the upper echelon screaming "I thought this system was going to be reliable!!!"
Well Sir, it is, sort of.
Factor in aging drive spindle noise, worn internal mechanical components of the drive- and you have all the makings of a king size headache.
The bottom line is this: in any disk array, there are trade-offs in performance, integrity, and finally, cost.
Therefore, the reality of which array and which architecture to select must come down to two things: first, data integrity. Just how good is it? Secondly, cost. You can't ignore the CFO and their insistence to remain within a sane budget.
Anything less than that and you will fall into that age old trap of 'head in sand' viewpoint, and it WILL come back to bite you.
I have to admit, I did not understand your point. Are you saying that error correction is needed, or not needed? Please bring it down a notch for me, since I am trying to understand, but I do not yet.
The way I remember this (and it's been a while), with RAID 5 you need one extra drive, and with RAID 6 you need two. So for large arrays the difference in overhead (i.e., drives purchased vs. storage available) is not significant. Then in the case of a drive failure - which is still the most common failure mode for RAIDs - with RAID 5 you're running unprotected and need to get a replacement drive in "right now"; with RAID 6 you still have RAID-5-equivalent protection and can replace the failing drive on a more practical schedule.
I also prefer RAID 6 solution vs RAID 5 because of potential of 2nd hard drive going bad. This is especially true with larger capacity drives.
I recently had the opportunity to buy new NAS for home virtualization lab and went with RAID 6:
http://www.virtualizationtalk.net/153-budget-iscsi-nas-san-option-for-home-lab/
In certain cases, I would also consider RAID 10 solution as it also can sustain 2 failed drives.
I recently had the opportunity to buy new NAS for home virtualization lab and went with RAID 6:
http://www.virtualizationtalk.net/153-budget-iscsi-nas-san-option-for-home-lab/
In certain cases, I would also consider RAID 10 solution as it also can sustain 2 failed drives.
Recently had a 5+ TB array in raid 5 for archiving data. The write performance was terrible and rebuild took several days. In addition GUID Partition table not recognized by older clients. I replaced it with multiple arrays of RAID1. still have the rdundancey and better all around performance.
Agreed, we've got a Promise vTrack with 16 1TB drives in a RAID6 configuration. We do lose some disk space and there is probably a slight performance hit; but the rebuild and fault tolerance benefits make it worth it. These 16 drives were manufactured very close to each other. If one drive dies, we think the probably of a second drive dying increases. RAID6 helps with that issue.
- Keyboard Shortcuts:
- Prev
- Next
- Toggle

































