Linux

File system migrations using tar and dump

Scott Reeves answers a reader's question about migrating an existing ext4 file system to LVM. Here are some tips about how to copy the old files to a new file system using dump and tar.

A question that has come up recently relates to moving an existing ext4 file system so that it is using Logical Volume Manager (LVM). This is not something that can be done without recreating the file system. Whilst this is not ideal, it is not an onerous task to migrate a file system to use LVM, and in this tip, I'll give a couple of ways to copy the files across to a new file system.

Assuming you have a new disk, you can partition it, create an LVM logical volume, create a file system and mount the file system. Once the file system is mounted, the next step is to copy files from the old file system to the new file system. There are a couple of useful commands that can do this. The commands used in this tip are dump and tar. Another command that can also be used is cpio. For brevity, cpio is omitted in this tip, but will most likely be the subject of a later post.

Both dump and tar allow writing to standard input/output. This is ideal if you want to do a one-line command string that copies files from one file system to another. There is more: the output can be piped to gzip. This reason for using gzip may be when copying the files across a network (or to/from an NFS drive or any other network mounted drive). Using gzip can cut down on the network traffic. It does, however, add more processing overhead at the operating system level and takes more time to do the copy.

The first example uses tar without gzip. Here the data01 directory is to be moved into the new file system mounted on /db01. The command string is as below.

tar cf - data01 | (cd /db01; tar xvf - )

Note the use of "-". This indicates that tar should write to standard output. The v option is not usually used when running tar cf, as it may potentially interfere with the operation.  The output is sent to a pipe, which changes directory to the desired target and then runs the tar command with the x option and again with the "-" option. This tells tar to read from standard input.

Below is the version with gzip. This is similar to the above, but with a gzip/gunzip added in. The -c option tells gzip to take its input from standard input. For gunzip, -c tells gunzip to send output to standard output.

tar cf - data01 | gzip -c | (cd /db01; gunzip -c | tar xvf - )

As mentioned above, using gzip adds some overhead for the process.  To illustrate this, both the above commands can be run with the time command as a prefix. This gives the following results:

With gzip:

real  0m6.878s
user  0m5.684s
sys   0m0.836s

Without gzip:

real  0m3.173s
user  0m0.040s
sys   0m0.524s

The operation without using gzip is faster than with using gzip. If you are copying files over a network, then using gzip will lessen the load on the network, at the cost of taking extra time for the copy.

The other commands to use when copying file systems are dump and restore. These are old commands but still very useful. The command syntax is simple, but care must be taken in using it. Below is a sample run of the dump/restore commands:

root@gudaring:/db01# dump 0f - data01 | (cd /db02; restore -ruf -)
  DUMP: Date of this level 0 dump: Mon Oct 24 22:47:18 2011
  DUMP: Dumping /dev/mapper/vg01-home (/db01 (dir /data01)) to standard output
  DUMP: Label: none
  DUMP: Writing 10 Kilobyte records
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 25615 blocks.
  DUMP: Volume 1 started with block 1 at: Mon Oct 24 22:47:19 2011
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
./lost+found: (inode 11) not found on tape
  DUMP: Volume 1 completed at: Mon Oct 24 22:47:26 2011
  DUMP: Volume 1 25570 blocks (24.97MB)
  DUMP: Volume 1 took 0:00:07
  DUMP: Volume 1 transfer rate: 3652 kB/s
  DUMP: 25570 blocks (24.97MB)
  DUMP: finished in 7 seconds, throughput 3652 kBytes/sec
  DUMP: Date of this level 0 dump: Mon Oct 24 22:47:18 2011
  DUMP: Date this dump completed:  Mon Oct 24 22:47:26 2011
  DUMP: Average transfer rate: 3652 kB/s
  DUMP: DUMP IS DONE

As before, this could also be done using gzip and gunzip.

dump 0f - data01 | gzip -c | (cd /db02; gunzip -c | restore -ruf -)

Again, this will take extra time, and is probably best suited for copying over a network. Using dump requires care not to mix up the target with the source. Always check the man page for the syntax if unsure.

This is a fairly straightforward procedure to copy files from an old file system to a new one. In this case, the old file system is an ext4 file system. The new file system is still ext4, but it is using LVM as the volume manager, giving extra flexibility, and potentially meaning future migrations could be made easier by using LVM mirroring to migrate data.

About

Scott Reeves has worked for Hewlett Packard on HP-UX servers and SANs, and has worked in similar areas in the past at IBM. Currently he works as an independent IT consultant, specializing in Wi-Fi networks and SANs.

3 comments
Brainstorms
Brainstorms

There's a very simple way to do this, whether all within the same machine or across a network: Use 'rsync' -- one of the handiest commands around. With the partition for the existing FS mounted, and the new partition (already formatted) mounted & ready, all you need is this: rsync -avx old-fs-mount-point/ new-fs-mount-point/ --and take note of the trailing '/' characters -- they're significant. If the destination is on a different system, then use this form: rsync -avx old-fs-mount-point/ user@host:new-fs-mount-point/ --or go the other way: rsync -avx user@host:old-fs-mount-point/ new-fs-mount-point/ The 'a' switch is critical; it means do an "archive" type of copy, i.e., everything, symlinks & all. The 'x' switch is important for copying file systems this way; it means "don't cross file system boundaries", i.e., copy only "this partition" and skip over any partitions mounted in the source FS tree. The 'v' is optional; it means list each directory and file on the terminal as it's copied. (Adding a -P switch will give each file a percent complete status as well.) 'rsync' has many useful options, too. One valuable one is '--exclude=' which tells 'rsync' to skip particular files or directories when copying. ('-x' is really an '--exclude' for a FS mounted internal to your FS being copied -- and you don't have to figure out where/what they are.) It makes no difference to 'rsync' if the FS is mounted as a straight partition or as an LVM volume, or even as either type mounted on a remote machine across the network. It's also efficient, but that's mostly a benefit if you're updating a previous copy...

brf531
brf531

Given the defaults and options that 'tar' supports, the command: tar cf - data01 | gzip -c | (cd /db01; gunzip -c | tar xvf - ) can be simplified to: tar cz data01 | ( cd /db01; tar xzv ) You can substitute 'j' for'z' and use 'bzip2' compression Without the data compression it becomes: tar c data01 | ( cd /db01; tar xv ) Not to be picky, but us IT folks are really lazy and HATE to type!

bjswm
bjswm

I have successfully done this sort of thing with cp -ax.