Disaster Recovery

How do I... Back up an entire hard drive under UNIX?

Backing up an entire disk under any operating system is a necessary evil. Justin James walks you through the process needed to create an identical copy of an entire disk under UNIX.

This article is also available as a TechRepublic download.

This article walks you through the process needed to create an identical copy of an entire disk under UNIX. It is useful to create a system image for installation or backup purposes. You will need a second hard drive of the same or larger size as a destination for the backup. The disk created will be full bootable; in the event of a drive failure, this backup drive can be swapped in for the previous drive and the system will boot and be in the same state that it was in when the backup was made. While booting off a drive that believes that it is a powered-on system is not ideal, it can get you back into operation quickly. In addition, the backup disk can be used as a perfect copy of the file structure, in case of permissions problems or data corruption.

NOTE: You must be logged in as root to perform these operations.

Verify and prepare disks

The first thing to do is to verify that the second disk is being recognized by your system. To find this out, perform a directory list on /dev as shown in Figure A.

Figure A

A directory list

On this particular system (FreeBSD 6.1 with two IDE hard drives), the first drive is listed as ad0 and the second drive is listed as ad1. Check with your particular UNIX's documentation to find out its exact pattern for listing disks. For the purpose of this article, ad0 (the master drive on the first IDE channel) will be the source drive for the backup and ad1 (the slave drive on the first IDE channel) will be the destination drive for the backup.

Once you have verified that your disk has been recognized by the operating system, you need to properly partition and label the disk. To initialize the disk, we run the following commands:

  1. dd if=/dev/zero of=/dev/ad1 bs=1k count=1
  2. This command writes 1024 zeroes to the disk to ensure that it is clean.

  3. fdisk -BI ad1
  4. This command will create a disk slice that uses the entire disk and initialize it. If you wish to use a different boot loader for this disk other than the default, you will need to consult the documentation for fdisk, or manually install a different boot loader to this drive.

  5. disklabel ad0s1 > /tmp/savedlabel
  6. Save the disk label of our source drive to a temporary file.

  7. disklabel -R -B ad1s1 /tmp/savedlabel
  8. Write the original disk label to the new drive.

  9. newfs /dev/ad1s1a
    newfs /dev/ad1s1b
    newfs /dev/ad1s1d
    newfs /dev/ad1s1e
    newfs /dev/ad1s1f
  10. Here, we create a new file system on each of the partitions that we created. These partition numbers are the FreeBSD defaults. You may want to perform a directory listing on /dev to ensure that you are creating a file system on each partition that you created on disk ad1.

  11. mkdir -p /_bk
    mkdir -p /tmp_bk
    mkdir -p /usr_bk
    mkdir -p /var_bk
  12. This creates a place for us to mount the new file systems with the names for the backups. As always, your particular file system may be a bit different.

  13. vi /etc/fstab
  14. This will bring up the fstab file in vi for us to edit. We need to edit this file so that the system is able to use the new partitions upon reboot. If you do not like vi, you may use the text editor of your choice. The original fstab file is shown in Figure B.
    We are going to replicate all of the information for disk ad0 so that the new partitions on disk ad1 are mounted under / with the same names as the original partitions, with _bk appended to the name, using the same names that we did in Step 6. Do not create an entry duplicating the swap partition (as noted by the FStype column). The fstab file should look like Figure C when finished.

Figure B

Original fstab

Figure C

Changes to fstabmade
  1. Now, we test the new file systems and mount them without needing to reboot, using the directory names and partitions in accordance the with fstab changes that we made:
mount /dev/ad1s1a /_bk
mount /dev/ad1s1e /tmp_bk
mount /dev/ad1s1f /usr_bk
mount /dev/ad1s1d /var_bk
  1. The final step is to perform the backup itself:
cd /_bk && dump –L -0 -f - / | restore -r -f -
cd /tmp_bk && dump –L -0 -f - /tmp | restore -r -f -
cd /usr_bk && dump –L -0 -f - /usr | restore -r -f -
cd /var_bk && dump –L -0 -f - /var | restore -r -f –

This runs the dump command which performs a full file system copy and immediately redirects its output to the restore command. The -L switch tells dump to use a file system snapshot if possible, since we are working on a live file system. You may omit the -L if you are not backing up a live file system. If we did not want to immediately image the backup disk, we can direct dump to output to a file, which can be used with the restore command to re-image a partition.

  1. Edit the file /_bk/etc/fstab to remove all references to the second disk, so it looks like the original fstab file. This is so that if you ever boot off of this duplicate drive, the system will not be trying to mount file systems on a drive that does not exist, since this secondary disk will be moved to the first disk location. Alternatively, you could edit the fstab file on the backup disk (/_bk/etc/fstab) so that it has no references to the original disk, so you could remove the original disk and not move the backup disk, and it would boot normally. If you choose this route, make sure that you have a reference to ad1s1b as a swap partition. How you prefer to go about this depends wholly upon how you prefer to perform the switchover in the event of disk failure.

At this point, if you perform a directory listing of /_bk, /usr_bk, /tmp_bk, and /var_bk you will see that they look identical to what is contained in the original directories. To test your backup, shut the computer down, physically connect the backup disk with the same hardware settings as the original disk had, and restart the computer. Your computer should reboot in the state that it was in at the time of the backup.

Final thoughts

Keep in mind, however, that this is not necessarily an ideal situation; since you performed an exact duplicate of the disk, the system is going to "think" that it had an improper shutdown. The backup disk, while bootable and usable, is best used as a near line backup of the full file system, not as a way to be able to immediately recover a down system. While it can be used for this purpose, the state of the backup, that is, still being considered powered-up (in terms of things like lock files, PID files, and so on), may not be ideal for your needs. It is also possible to write a script to be run through a crontab at a regular interval to keep your backup up to date. Using dump as the cornerstone of a backup strategy is a time tested, proven, well supported and well documented technique.

About

Justin James is the Lead Architect for Conigent.

15 comments
A.C
A.C

wouldn't it be a lot easier to use raid if you are going to have a second drive in your machine for "backup". You can find really cheap raid cards or use software raid in just about any flavour of *nix on a PC. If you don't need a live backup then a dd of the entire disk will do for a standby drive (google dd copy entire disk or similar to get a million and one links on how to perform this simple op, with just a single dd command)

Mark W. Kaelin
Mark W. Kaelin

When was the last time you backed your data? Has it been more than a week? More than a month? Are you playing with fire now?

Justin James
Justin James

Yes, I agree that RAID is indeed an easier way of handling data redundancy, however, RAID does not address the "whoops! Didn't mean to delete that file!" problem. RAID also does not allow you to take your backups offsite. Even in this day of inexpensive RAID systems (the desktop I am currently building will have 2 RAID 1 arrays just for redundancy), the old-fashioned backup still has a purpose. My experience has been that having a near line backup handy is a real blessing, because of the speed and ease advantages of restoring files from a locally stored, live file system, as opposed to having to find the tape, put it in, and restore, etc. J.Ja

DanLM
DanLM

1). Will it work over a lan. 2). If a windows system had to access the backup drive when the unix box was down, can it be done? Could I move the backup drive to a windows machine, and use it somehow(partition magic changing file system?) Dan

DanLM
DanLM

Time to go buy that extra hard drive and start doing at least weekly backups using what was laid out here. No chit. Dan

DanLM
DanLM

My home file server is a BSD 6.1, just as is the article. And, I really don't have a backup plan in place for it. Actually, I have been wondering exactly what to do about it to be truthful. I knew about the dump command, just had never really seen a good example of it in use. I'm not sure I will use this for my system drive, but I may use this example for backing up my 250 gig which contains all my data. I'm more worried about losing my data then I am about losing my system drive. Considering Sam's club has an external 250 gig for only 140, I may do this tonight. Hey Justin, thanks for the article. Any chance you could cover some topics of the pfctl firewall that was ported from OpenBSD to FreeBSD. I'm having issues with flip pen brute force attempts on my home machine. I've put in place ftp/ssh log parsing routines that execute every 2 minutes which firewall these twits. And I have throttles built into my firewall. But, I know I haven't scratched the surface with what I can do with pfctl and I'm always looking for more information on it. I'm also looking for some examples of reading this the pfctl log, I'm considering the writing of an analysis script to run against it. As it is, I only list off things that have been blocked by specific fire wall rules. Dan

bearsaxman
bearsaxman

In addition to not accounting for accidental deletion of files, should your computer become infected with a virus, you simply have more copies of that same virus on a RAID array. RAID is for hardware redundancy and high availability. Backups are for data security and for instances, though rare, where multiple disks fail in a RAID array that cannot handle it.

Justin James
Justin James

Reminds me of all of the times I was working on the car, and said, "maybe I should put on [insert safety equipment here] to do this?" and then suffered an injury that said safety equipment would have prevented. Included dual 3/8" wide, 1/2" deep punctures in my biceps with rusty bolts, gasoline in the eyes, and severely cut fingers. I almost lost everything this week from a dying hard drive, but the RMAed drive came in just in time, and I do nightly backups to my home server from the desktop. and in a few short weeks, the server is being replaced by one with RAID 1, and the desktop is being replaced with one that has two RAID 1 arrays, one for data, and one for system/applications. Because my data is too valuable to be lost at this point, including over 6 years' worth of emails and 15+ years of documents and code. J.Ja

Justin James
Justin James

Dan - It is great to know that someone was able to get something out of this article! It is also nice to know that I am not the only person using FreeBSD at home for a server. :) I will look into the pfctl firewall and see what I can do in terms of an article for it. Just as a super-quick suggestion, if you want to parse those log files, Perl will be your best friend, unless you are having the logs dump to a database. Regardless, parsing log files every few minutes, especially in the face of a brute force attack, can hit the CPU on the server pretty hard, so I can see why you would prefer to have it happen in the firewall itself automagically. As soon as I receive the hardware for it (supposed to have been shipped today), I will be writing an article on assembling a router/firewall running IPCop. Why wait for hardware? Because it is going to be using a TK-63T system from eWay, using ultra-low amounts of power, fanless, and the only moving part will be a hard drive. This baby will be able to act as a full router, firewall, VPN gateway, DHCP server, Web proxy & cache, perform whitelist/blacklist blocking, gateway virus/spam scanning and blocking, and (drumroll folks!) support a dialup failover interface in case the network outtage goes down. If you are running a home server, this is definitely an article you will not want to miss (look for it in a few weeks). Even if the hardware end of it does not appeal to you, and you would rather just throw IPCop on an old PC (its system requirements are delightfully low, even the TK-63T is tremndously overpowered for the bulk of its functionality), the IPCop end of the article should be useful to you. J.Ja

DanLM
DanLM

And not stop or block the attempts. I was going to stay with my application feeding the firewall table file for that. The only thing that would really change if I switched ssh/ftp applications that I can think of is the regex/log file. I had previously used a very good python application called denyhosts I found through source forge. It works by appending offending ip's to the hosts.allow or hosts.deny file. It works, and it works well actually. Lots of good write ups on it if you Google it. But, I wasn't real thrilled with having my hosts.allow file modified automatically. And I never correctly figured out the syntax of hosts.allow to pull in an external file of ip's. It can be done, I just never got the syntax right. That, and I don't think hosts.allow allow for cidr's. Which I use. My logs don't go to a database, they are going straight to log files that are rolled through syslogd. I clean them out now and again. Which I should probably automate to only keep so many copies. I know I need to do that with my ftp logs, cause I have 2 months of them. I kind of like the idea of using a db for statistical purpose's though. Only offending ip's/cidrs being stored though, which with my blacklist is now at about 900. Chuckle, I don't know if the girl friend would let me have another pc. Lol, I already am running 2 plus hers. That, and I'm running out of plugs where I have these machines at. It's an apartment. Your right, I am on broad band and I'm sure they are just pounding the snot out of the ip block. Doesn't make it any less annoying though. Thanks Justin, appreciate the feedback on this. I have been struggling with these attacks for a couple years. I just keep tightening down the machine more and more as I learn new things. Dan

Justin James
Justin James

Dan - It sounds like you are definitely on the right track. Sadly, even with the firewall in place on the server, you are still completely jamming up your server under a brute force system, even if it is to wake up a process to kill the connection! Here are some suggestions that may help: * If your log files are going to a database, try having it routinely purge older, less relavent entries to a compressed and/or read-only table/database. This way you can do your reporting and search for offenders on a table optimized for it, while also freeing up the transactional, "right now" database to be able to better respond to immediate items. It will also save you a bunch of space on your drive! * If you can afford to, shift the blocking to a separate router/firewall device. That is one reason why I am looking into IPCop, I want my server to be free to handle actual work, not blocking garbage! In addition, a number of DoS attacks rely upon doing things like hanging on an ACK/NAK sequence; even if they do not sucessfully break the server through a brute force attack, they can hang it up anyways! * Restricting by originating country is a great start. Remember though, these guys may very well have root'ed themselves on a US based server, or even something that you should be able to trust! Real-time behavior monitoring is the way to go. * If you get few enough hits to FTP/SSH, you may want to change from a blacklist methology to a whitelist methodology. * A lot of this type of traffic is completely automated; chances are, your server is just one of a million. If you are on a cable of DSL connection, they are just blasting the ISP's IP block, looking for the one or two boneheads that plugged an unsecured OS directly into their modem and happen to have an FTP, SSH, or SMTP server up and running with default passwords. Heck, all it takes is for you to be building a new *NIX box with default options (depending on the distro), and open that firewall *before* changing default passwords and if they get lucky, BANG! you're owned. * Personally, I would not dedicate an entire "real" PC to firewall purposes, simply because of the power usage! Unless it is something like an old OEM machine built with the cheapest components, chances are, you've got a 300+ W PSU spinning a big, clunky, inefficient drives and a zillion fans. If you are spending $10/month on the juice to run that, in 2 years you bought yourself a small, energy efficient embedded system. Think $10/month is unrealistic? I don't. In the winter time, my power bills average $55/month; that is 1 60W incandenscent light bulb (average 5 hours/day), a refrigerator, and 2 PC on a 1.5 kVa UPS (which adds some additional energy inefficiency). I have no other appliances which use electricty, except the occassional stove or microwave use, a cell phone charger, and an alarm clock (I live spartan). So in all fairnees, it is probably costing me $15+/month/PC! Looked at that way, it is hard to afford *not* to replace a dedicated PC for firewalling with an energy efficient system! * Restricitng the number of open ports is always the first thing to do, so it's great that you did it. I once stuck my server in the DMZ, and had to take it out within an hour because of the traffic coming in. Yes, BSD is extremely secure compared to many Linux's and Windows. But I find that it is always better to stop it at the gateway, and not worry about someone slamming me with port scans constantly. I know we're way off topic now, but it is a good thing. :) J.Ja

Justin James
Justin James

The nice thing about doing it in hosts.allow is that it is a much more universal system, as opposed to doing it within an individual application. This way, if you ever decide to switch applications like the firewall or FTP or SSH server, you do not need to attempt to recreate the code. The crontab approach, while it works, is definitely not ideal, as you are well aware. J.Ja

DanLM
DanLM

I was doing some googling on ssh brute force attacks for chits and giggles, and I think I just found another way to launch my scripts. The spawn command in the hosts.allow. Have to do some modifications to my scripts, but it would be an instant launching of them at the time of attempted ftp and ssh logins. I really want to remove the execution of my scripts from the crontab and shut the attacks down as soon as they happen. Multi layered approach to defensive measures is always a good thing. Sorry, I got off topic here. Dan

DanLM
DanLM

I first noticed brute force attacks against my ssh a year or so ago. I wrote that log parsing routine in a shell script, which then writes the ip to a firewall table file. I then refresh the firewall rules, and the attack is blocked. By the way, I don't purge that table. Last week, i was poking around my logs and starting looking at my ftp logs. That's when I noticed these twits were pounding me there also. I wrote a different script to deal with this and used Perl. Again, I write the ip's to a table file and refresh the firewall. The nice thing is, even if they guese a login. I have shut down SSH to only allow for public/private key authentification. They can kiss my .... Never mind, You get the point. Chuckle, by the way. I have a url if your intrested that post's any updates to china/korea cidrs that have been identified. I actually upgraded to FreeBSD 6.1 because it had a latter port of the pfctl firewall. This version of pfctl allows for throttling: (max-src-conn 10, max-src-conn-rate 5/20, overload flush) This specific line is my friend. This shuts down an attack right away, or reasonably so. At least it's quicker then every 2 minutes. I use ncftpd as my ftp daemon, and I just found that I can write a plug in for it in Perl that will be launched on events that I define. This is going to be my next level of protection, I'm going to shut these idiots down quick at the ftp level. The reason I asked about the pfctl. I know, or at least I'm pretty sure. That you can launch a application based on a rule set. If that's the case, I'll write a Perl plug in to be launched when ever someone log's in via ssh. And I'll bail all over these pricks at login time. This irritates the crap out of me Justin. It's a bloody home server, that's all. And I'm getting hit on a daily basis. I found that most of the attacks occur from China/Korea. I now block all of those nations cidrs, which cut back on what I would call worrisome attacks. But, I dump my logs every day. And I have at least 10 attempts EVERY DAY. Every single flippen day. And I'm probably averaging a new ip every couple days that I hadn't blocked before. This is through FTP though, and that may slow down real soon when they realize I'm covering that area also. I have some friends that run IPCop, they like it a lot. I just never had the extra machine. And I didn't think about it when I upgraded my bsd machine to a 1.4 or else I would have made the old machine(600mhz) an IPCop machine. I think I only have 3 ports open to that machine from my router. It scares me to think if I had more open what I would be dealing with here. I'll be looking forward to the articles Justin. Thanks again. [i]Edited to correct my hurried posts from work[/i] Dan

Editor's Picks