Disaster Recovery

Replace a failed drive in Linux RAID

Vincent Danen outlines the steps he took recently to identify a failing drive and then replace it with the least disruption possible.

A few weeks ago I had the distinct displeasure of waking up to a series of emails indicating that a series of RAID arrays on a remote system had degraded. The remote system was still running, but one of the hard drives was pretty much dead.

Upon logging in, it was found that four out of six RAID devices for a particular drive match were running in degraded mode: four partitions of the /dev/sdf device had failed; the two operational partitions still working were the /boot and swap partitions (the system is running three RAID1 mirrored drives; a total of six physical drives).

Checking the SMART status of /dev/sdf showed that SMART information on the drive could not be read. It was absolutely on its last legs. Luckily, I had a spare 300GB drive with which to replace it, so the removal and restructure of the RAID devices would be easy.

Still remote, I had to mark the two operational partitions on /dev/sdf as faulty, which was done using:

# mdadm --manage /dev/md0 --fail /dev/sdf2
# mdadm --manage /dev/md1 --fail /dev/sdf3

Checking the RAID status output, I verified all of the RAID devices associated with /dev/sdf were in a failed state:

# cat /proc/mdstat
Personalities : [raid1]
md6 : active raid1 sdc1[1] sda1[0]
      312568576 blocks [2/2] [UU]
...
md0 : active raid1 sdf2[1] (F) sde2[0]
      1959808 blocks [2/1] [U_]

The output above is shortened for brevity as there are eight md devices.

The next step was to remove /dev/sdf from all of the RAID devices:

# mdadm --manage /dev/md0 --remove /dev/sdf2
# mdadm --manage /dev/md1 --remove /dev/sdf3
# mdadm --manage /dev/md2 --remove /dev/sdf5
...

Once all of the /dev/sdf devices were removed, the system could be halted and the physical drive replaced. If you do not have a drive of the exact same size, then you need to use a larger drive; if the replacement drive is smaller, rebuilding the arrays will fail.

When the drive was replaced and the system turned back on, the system booted and from there it was a matter of creating a similar partition layout on the new drive as was on the old drive. Because this was a mirrored RAID1 series of arrays, we could use the working drive (/dev/sde) as a template:

# sfdisk -d /dev/sde | sfdisk /dev/sdf

This creates the exact same partition layout on /dev/sdf as exists on /dev/sde. Once this is done, run fdisk -l on each drive to verify the partition layout is identical. The next and final step is to add all of the new partitions to the existing RAID arrays. This is done using:

# mdadm --manage /dev/md0 --add /dev/sdf2
# mdadm --manage /dev/md1 --add /dev/sdf3
# mdadm --manage /dev/md2 --add /dev/sdf5
...

As you add the new devices to the existing array, the information in the array will be properly reconstructed. Depending on the size of the partition, the re-sync could take a few minutes to a few hours. You can cat /proc/mdstat to see the progress.

With the size of drives available today, my primary concern is data integrity, and for that, nothing beats RAID1. The hardest part in replacing and reconstructing the RAID arrays was figuring out which of the six drives in the system was the faulty one and replacing it. The longest part was the reconstruction, but this runs in the background and may make the system run a little sluggish, but it's still online and available.

The total downtime of this exercise was perhaps 20 minutes. If uptime and data integrity are important, seriously consider using RAID1. It has saved me numerous times from dying or faulty hardware and the effort required to use it is minimal.

Get the PDF version of this post here.

About

Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.

15 comments
venk2ksubbu
venk2ksubbu

Assume I have a system with 20 disks of 4 RAID-5 software RAID array without LED's to show the health of the disk. I would like to know physical location of the disk which has gone bad. How can I do this ? Can you please suggest a solution. When I googled around this I saw people saying that to label the disks. Is this correct or is there any other software solution ?

ferdie
ferdie

Thanks for the info, I have been really stressing about this. that the problem I have with linux RAID, basic operation require you to read or document an range of obscure commands. Even software RAID bios (like SI or Intel) have dead simple interfaces running. (how about some kernel modules to work with onboard "fake" RAID.) If Linux want larger acceptance, they really need to make sysadmins lives easier, not harder. There is so much room for error in those commands, I would never dare attempting it with drives containing company data If a hard drive failed on one of my windows boxes, or on a linux box with hardware RAID, I would yank the broken drive, stick in the new one and hit the rebuild button, its really that simple. Until they can make it even close to that simple, I will stick with hardware RAID for linux boxes. At least they took the time to make the sysadmin's life easier.

ozark-tek
ozark-tek

Consider RAID 5 and hot swap drives and you can really minimize down time.

24hour
24hour

Great article Vincent! RAID 1 is the best configuration for redundancy and BCP. We see RAID 0/1/5/10 Arrays come in at http://www.24hourdata.com all the time. Companies and Users that have RAID 1 in place always get their data back and pay much less money. Most people do not understand the most RAID levels are built for fault tolerance not redundancy.

moabrunner
moabrunner

I have a server that had 2 250 gig drives setup in a RAID 1. I have installed 2 tera byte drives but the RAID will not expand to use the space. I have tried all kinds of stuff, anyone know how to expand the partition? I only have 3 partitions, Boot, Swap and Ext3 for the data.

raimo.koski
raimo.koski

It is not really Linux which is difficult, it is partitioning and LVM, but you don't really need those in most cases. I try to use whole disks without any partitions in RAID5/6 arrays and then it is just to remove the disk, replace and add, or if there is a hot spare, that can wait for a later time. The only command you need is mdadm and some program like smartctl which displays drive serial numbers and a note with failed and replacement drive serials. For bootable disks that recipe isn't so good, but if you avoid logical partitions, it makes life a bit easier. That way copying just the first 16 (boot record + partition table + grub 1. stage) sectors with dd booted from live CD/USB stick to the replacement drive would be enough as a preparation and then just regular boot from the hard drive. This assumes you have RAID1. I don't much mind complexity even if I try to avoid it, but the flexibility and reliability of SW RAID is much better than with HW RAID. mdadm can now grow with bigger devices or more devices and even convert RAID-5 to RAID-6. I doubt any HW RAID can do those. I am currently exploring ZFS and zpool (on OpenSolaris) and it has features like instant construction of raidz? arrays (raidz is like RAID-5, raidz2 like RAID-6 and raidz3 might some day have RAID-7 as equivalent), which need co-op of file system and raid software and a separate RAID will never be that smart. Mdadm is separate and a construction of a RAID array with parity takes ages even if it runs on background. Brtfs on Linux is going the same direction as ZFS with integrated RAID. Guess what Sun recommends with ZFS? Use whole disks, no partitions or slices.

MrJoeB
MrJoeB

Even with RAID5, using mdadm you're still going to have to do the mentioned steps. If this were true hardware RAID, yes the rebuild/fail/add would be done automatically.

raimo.koski
raimo.koski

1 expand partitions 2 expand raid 3 expand lvm if you use it 4 expand filesystems Step 1 determines how much each expands, so careful with that. Parted is the best tool for that step. If you are lucky or have planned ahead, you don't expand boot and swap and ext3 is the last one so there is no need to move stuff around and even fdisk would suffice, just change the end of last partition to the end of the disk.

CharlieSpencer
CharlieSpencer

Hardware RAID is much faster than software. How do you conclude software RAID is more reliable? Software RAID is completely dependent on the hard drive(s) the OS is installed on. I've seen more hard drive failures over the years than RAID controller failures. A hardware controller also introduces additional tools to monitor the status of the drives. The cost of the controller doesn't add that much to the cost of a stand-alone server, and it's built into most SANs.

wdewey@cityofsalem.net
wdewey@cityofsalem.net

Software raid only provides redundant disks. Hardware raid can provide redundant controllers so that if you lose a controller your array continues to function. SW RAID requires a boot sector that is a non raid sector. If that drive or sector gets corrupted then your machine will no longer boot, but with HW raid your machine will still boot no matter which of the redundant drives is corrupted. Bill

paulenet
paulenet

It appears that you are incredibly misinformed about HW RAID, or perhaps your experience with HW RAID was with some lousy controllers. You need to understand that there are pros and cons to SW / HW RAID. HW RAID has higher initial cost, but most large enterprise environments see the benefits of HW RAID overwhelmingly outweighing the initial cost, and more to the point, the performance and reliability benefits over SW RAID. If SW RAID were so much better than HW RAID, then most large enterprise environments would be running SW RAID, period. The fact is, the overwhelming majority of large, enterprise environments run HW RAID, and they are not doing it because they don't understand SW RAID. Instead, large enterprise environments use HW RAID because they know it offers better system performance, better reliability, and the most flexibility. Oh, and my Intel HW RAID controllers can convert a RAID-5 array to RAID-6 just fine, and will do it much faster than your SW RAID. Of course, the high-performance HW RAID controllers I use are not the RAID controllers you typically find at your local retail shops. Based on my experience with numerous, large IT environments, the HW RAID controllers I use and work with are among the best available (Intel, LSI, Adaptec, etc.), and my experience (as well as experience from any of my colleagues) with HW RAID does not reconcile with your nonsense about HW RAID. On a side note, as if there was not already enough long-standing problems with RAID-5, when you begin to consider large hard drives currently ranging from 1TB - 2TB these days (and they certainly are not going to be getting smaller), it is easy to argue that RAID-5 should never be used in the first place, due to the likelihood of a drive failure, followed by another failure during a rebuild. Friends don't let friends use RAID-5! http://www.miracleas.com/BAARF/BAARF_members_sql.php http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt

moabrunner
moabrunner

I have tried to use G-parted to expand the partition, but it sees it as a linux RAID and will not do anything with it. I guess this is where I am stuck. I make a full backup with clonezilla before I do anything so I can always blast it back if I screw it up.

paulenet
paulenet

I am merely stating facts. Just because you don't like them, or how I presented them, doesn't make me arrogant. So please, keep your name calling to yourself.

Brainstorms
Brainstorms

Your statement that SW RAID requires a non-RAID boot sector is not correct. I build my systems entirely on RAID 1 pairs -- boot, swap, OS, data, etc. -- everything. (My data drives are LVM partitions that sit on top of the RAID md's.) And while I don't agree with the arrogant attitude of the 'guy from Redmond', I do agree that the day & age of RAID 5 is long over. Disks are cheap enough to spend a few more dollars to double up on everything. Cost no longer justifies making your system do gyrations to maintain a parity slice so that you can get away with "only buying one additional drive". KISS applies here. Aside from reducing system load and disk wear & tear, you have the ability to remove a RAID 1 drive and put it another system and run. You have no hope doing that with RAID 5. (Makes cloning a system VERY easy.)

CharlieSpencer
CharlieSpencer

Is your RAID configuration managed by the OS or by a hardware RAID controller? If the latter, you'll have to use the controller's management application.

Editor's Picks