Development of filesystems on Linux has come far in the last few years and there are a number of important advancements that, when combined, can make for extremely flexible storage options for a Linux desktop or server. RAID support in Linux has been around for a long time, and so has LVM (Logical Volume Management) support. But have you ever considered putting the two together?
Individually, each has their own strengths and weaknesses. Together, they provide very flexible storage opportunities. RAID support in Linux has a number of options. You can combine disks to create a single large disk (RAID0) — this puts data from a RAID volume across multiple disks allowing for really good performance. There is also mirroring (RAID1), which is great for redundancy; data written to the RAID array is written to both disks simultaneously so in the event that one volume in the array dies (such as by a hard drive failure), the array can continue in degraded mode (writing to the surviving disk(s)), without data loss. With RAID0, data loss is a potential problem because if one drive in the array dies, the data it stored dies with it. The RAID10 mode combines both mirroring (RAID1) and striping (RAID0) together to provide performance and redundancy, but requires at least four disks.
LVM, on the other hand, is a mechanism for easily managing partitions on various hard drives. It is similar in many respects to RAID0 or JBOD as it can create logical volumes that span multiple hard disks. The primary benefit to LVM is that it allows you to resize partitions inside a physical volume easily. If one partition needs more size, and another has room to spare, the partitions can easily be adjusted, non-destructively. Couple that with support in ext4 for growing and shrinking partitions on-the-fly and LVM becomes very attractive indeed. It also makes creating backups easy, thanks to its snapshot feature.
Combining the two together, using LVM on top of RAID1 for mirroring, provides flexibility and redundancy. For instance, suppose you had a system with two 1TB hard drives. Traditionally you might use the disks individually, but should one of them die, the data on that drive would be lost (subject to backup policies, of course). You could instead tie the two drives together using RAID1, giving you 1TB of usable space rather than 2TB, but if one drive dies, it’s an easy matter to replace it and let the RAID array re-sync. You could have partitions on the drive for /boot, /, /home, /var, and /srv but without LVM each would be static in size. If you found out later that /var was too small, you would have some serious time-consuming work ahead of you to adjust partitions in order to make room for it.
Instead, you could create two partitions on each drive: /boot (as one RAID1 array, md0) and another not mounted that would be the physical volume for an LVM (md1). It would look like this:
# fdisk -l /dev/sda
Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000408c9
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 2050047 1024000 fd Linux raid autodetect
/dev/sda2 2050048 1953523711 975736832 fd Linux raid autodetect
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda2 sdb2
975735676 blocks super 1.1 [2/2] [UU]
bitmap: 1/8 pages [4KB], 65536KB chunk
md0 : active raid1 sda1 sdb1
1023988 blocks super 1.0 [2/2] [UU]
Here we have our two RAID arrays. The first RAID array (md0) is mounted as /boot. The second RAID array (md1) is the basis of our physical volume:
PV VG Fmt Attr PSize PFree
/dev/md1 vg_cerberus lvm2 a- 930.53g 0
If we had another RAID array (say, a third and fourth drive also setup as RAID1), we could create another physical volume for use in LVM. We could then add that physical volume to the volume group as well, combining the two arrays (or devices) into one volume group. In this case we only have one, but the volume group looks like this:
VG #PV #LV #SN Attr VSize VFree
vg_cerberus 1 4 0 wz--n- 930.53g 0
This is the vg_cerberus volume group, which is built on one physical volume, and has four logical volumes inside of it:
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
home vg_cerberus -wi-ao 97.66g
root vg_cerberus -wi-ao 29.31g
srv vg_cerberus -wi-ao 801.59g
swap vg_cerberus -wi-ao 1.97g
These volume groups can be dynamically resized if the need arises, using the lvextend and lvreduce tools (as well as the resizefs command to resize ext4 file systems). The four logical volumes (or partitions) are called home (mounted as /home), root (/), srv (/srv), and swap. These partitions are mounted like any other partition, however they are mounted using their device-mapper device names:
# mount | grep mapper
/dev/mapper/vg_cerberus-root on / type ext4 (rw)
/dev/mapper/vg_cerberus-home on /home type ext4 (rw)
/dev/mapper/vg_cerberus-srv on /srv type ext4 (rw)
While all of this sounds complicated, and might be to construct it using the command line tools, it is easy to accomplish with GUI tools at installation if your distribution of choice provides it (I imagine most do). Fedora 14, for instance, made this setup a point-and-click affair during installation.
The thing to remember is that /boot cannot be on an LVM. Some distributions, particularly older ones, may not like /boot being on a RAID array either, so you may need to have /dev/sda1, for instance, mounted as /boot and /dev/sdb1 mounted as /boot2 (with a daily rsync to make sure the contents of the /boot partition are synced in case of hardware failure and a need to boot off the other drive).
Fedora 14 permits booting off of a RAID array, so it was easy to assign /dev/sda1 and /dev/sdb1 to the RAID1 array (/dev/md0). The second partitions on each disk (/dev/sda2 and /dev/sdb2) were the size of the rest of the disk, and assigned to /dev/md1. After that, the LVM was configured and /dev/md1 was assigned to the physical volume, and the logical volumes were then each defined. It took roughly five minutes to create the entire thing at install.
The end result is that if I need to give /home more space and /srv has some to spare, I can do it very easily without rebooting the system. If one drive should fail, it’s a simple matter to pull that drive out, replace it, and create a similar partition layout and re-add the two partitions to reconstruct the RAID array. Because the RAID array can run in degraded mode, the downtime is reduced to how quickly the drive can be physically replaced when the system is powered down (hot swappable drives would reduce this to almost nothing).
With this setup, I have the ultimate flexibility in how space is allocated on the system and can adjust it as required, knowing that if one drive fails, the data is protected.
And if it simply is impossible to meet my storage needs with the drives that are available, adding another pair of drives and adding that available storage to the volume group to extend the existing logical volumes is a piece of cake too.