Open Source

When Linux panics: Managing an OS emergency

What would you do if your Linux OS crashed? Have you made any backups? Have you even considered the possibility? Jack Wallen, Jr. has. He discusses the value of boot floppies and fsck, and lists some important files and directories.

You use Linux. Your system is virusproof, tamperproof, hackerproof, crashproof, childproof, idiotproof–-practically bombproof. Your machine is up 24/7, and you know—beyond a shadow of a doubt—that it will never fail. Every precautionary measure has been implemented, and you constantly pick over your logs for every little detail. Nothing could go wrong.
This article appears courtesy of TechRepublic's TechProGuild, the subscription Web resource for IT administration and support professionals. Among other great benefits, TechProGuild offers in-depth technical articles, e-books, and weekly chats moderated by industry experts on hot topics such as the latest OS developments and career advancement. Sign up now for a FREE 30-day trial of our TechProGuild service.
That’s not necessarily true. Murphy's Law has never been more applicable than it is in the world of computers. As soon as you think that you've locked down a system, bitter irony rears up and bites you on the nose.

Yes, even with a workhorse like Linux, a user can suffer from the ultimate misfortune of a system failure or file-system crash. When you’re working in a Windows environment, there are various tools for recovery (such as defrag, scandisk, disk doctors, and recovery programs). What about Linux? What can a user do when a bad crash brings a system (and its user) to its knees? In this Daily Drill Down, I'll explain how you can create a boot floppy (post-install) and use fsck, and I'll list the files and directories that you should back up in case your Linux computer ever goes down (and you don't have access to a CD burner or a large tape drive that could back up your entire system).

Boot floppies
During the install process, most distributions ask users if they want to create a boot floppy. I’ll say this only once: It’s critical that you create a boot floppy! Creating a boot floppy will save you hours of frustration and the pain of having to reinstall your OS. If you have opted to skip this step in the install, however, don’t fret. You still can create a Linux boot floppy post-installation.

Before creating the boot floppy, you have to make sure that the following command:
rdev /boot/vmlinuz

responds with the correct hard drive partition. What you are looking for is the partition where “/” resides. To find out where / resides, type df. This command will offer you all your mounted Linux partitions. The root partition will reside in something like hda* (where * is typically an integer between 1-7). Once you’ve discovered this partition, you'll need to make sure that the vmlinuz file is pointing to that particular location. You need to run the command:
rdev /boot/vmlinuz /dev/hda* (where * is the number you discovered with 'df')

Now that you have made sure that your rdev is correct, you’re ready to create that boot floppy. Have a preformatted floppy (either from MS-DOS or Linux) in your machine. Mount the floppy device (by running mount /mnt/floppy) and run the following command as root:
dd if=/boot/vmlinuz of=/dev/fd0 bs=8192

What does that command mean? Well, dd is a disk duplication routine, if stands for input file, /boot/vmlinuz is the file that will be copied, of means output file (where /boot/vmlinuz will be copied to), and bs stands for block size.

Now that you have the disk, test it. Unmount your floppy (by running unmount /mnt/floppy) and reboot your machine. If the boot floppy is in proper working order, your machine will run through the boot process. Please note that booting from the floppy is somewhat slower than booting from the hard drive.

This new boot floppy will come to your rescue after many various disasters. For instance, as root, you may alter or erase your lilo.conf file accidentally. If this problem occurred, your system would be unable to map the hard drive, and the machine would be unable to boot. Now, you’d have a handy boot floppy, and your machine would come up to its normal state in a few minutes.

fsck is a Linux utility that you can use to check and repair the ext2 file system. There are many situations that could force you to invoke fsck. Such problems include an unclean shutdown of the system (like a power failure) or a system crash.

The fsck man page states: “fsck is used to check and optionally repair a Linux file system. filesys is either the device name (e.g. /dev/hda1, /dev/sdb2) or the mount point (e.g. /, /usr, /home) for the file system. If this invocation of fsck has several filesystems on different physical disk drives to check, then fsck will try to run them in parallel. This reduces the total amount of time it takes to check all of the filesystems, since fsck takes advantage of the parallelism of multiple disk spindles.”

Only recently did I come across a problem that required me to invoke fsck. I leave my system up 24/7 (Linux likes that), and I rarely have problems. One day, however, I came home to find that all of the clocks were blinking their warnings that a power failure had occurred. I found my main machine at an unfamiliar prompt; it asked root to run fsck manually because of disk errors. I sat down, typed in the root password, and ran fsck. Sure enough, the power failure had caused some bad or duplicate blocks to occur. Panic was my first thought, but interactive fsck came to my rescue. After running an interactive fsck session, I was able to recover the system and get it booted.

This tale is not as uncommon as I’d like to think. Nor is this tale of woe a worst-case scenario; however, my story proves that, without knowing the tools of the trade, a Linux user or systems administrator could become overwhelmed with reinstallations or perpetual re-imaging. fsck takes care of these possible disasters. But don’t mistake fsck for a magic cure for poor maintenance and administration.

Before I begin describing fsck's ability to save a file system, let's look at its ability to check a system. It’s sometimes necessary to check the Linux file systems for consistency. Think of this check as similar to MS-DOS's scandisk. The fsck check runs a scan of the entire file structure and reports its findings. Typically, the findings pertain to noncontiguous blocks of data (or defragmented data). The operating system will take care of these errors. You don’t need to concern yourself with Linux system defragmentation. The ext2 filesystem was created to act smarter than the MS-DOS or vfat systems by preventing bits of data from being stored back onto the device (such as a hard drive, a floppy, or a tape) in a fragmented manner, where one bit of data always is placed beside another without any space. Of course, there are other errors, including bad or duplicate blocks and wrong block counts.

To run a check on your system, drop out of X Windows and, in console, log in as root. As root, you’ll have to run a check on a file system that’s already mounted. If any errors are found (and corrected), you’ll have to reboot the system. You’ll receive a nasty warning when you attempt this check. (Don’t proceed if you’re wary of disabling a critical system.) Generally, there should be no problems with this check, but there are exceptions. As root (and in console), type:
/sbin/fsck -t ext2 /dev/hda5

and you will begin the following session:
Parallelizing fsck version 1.14 (9-Jan-1999)
e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09
/dev/hda5 is mounted.
####WARNING!!! Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.###
Do you really want to continue (y/n)? yes
/dev/hda5 was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 164077 has zero dtime. Fix<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -2071 -2072 -2073 -2074 -2075 -2076 -2077 -91734 -91735 -91736 -91737 -91827 -91828 -91829 -91830 -91831 -91832 -93664 -93665 -93666 -93667 -93668 -656772 -656773 -656774 -656775 -656776 -656777 -656778 -2205683 -2205684 -2205685 -2205686 -2205687 -2205688 -2205689 -2426413 -2426414 -2426415 –2426416
Fix<y>? yes
Free blocks count wrong for group #0 (5814, counted=5821).
Fix<y>? yes
Free blocks count wrong for group #11 (5895, counted=5910).
Fix<y>? yes
Free blocks count wrong for group #80 (0, counted=7).
Fix<y>? yes
Free blocks count wrong for group #269 (6200, counted=6207).
Fix<y>? yes
Free blocks count wrong for group #296 (6, counted=10).
Fix<y>? yes
Free blocks count wrong (1005747, counted=1005787).
Fix<y>? yes
Inode bitmap differences: -164077
Fix<y>? yes
Free inodes count wrong for group #80 (1809, counted=1810).
Fix<y>? yes
Free inodes count wrong (673732, counted=673733).
Fix<y>? yes

Once this check is run (and the above check ran into some errors), you’ll receive a report that’s similar to the following:
/dev/hda5: ***** FILE SYSTEM WAS MODIFIED *****
/dev/hda5: 88123/761856 files (1.5% non-contiguous), 2038341/3044128 blocks

If you receive a report that says, “FILE SYSTEM WAS MODIFIED,” then it’s very important that you reboot the machine.

You can run this type of check on a regular basis, but it isn’t necessary. One of the ways in which Linux is smarter than many other OSs is that it allows only so many boots before it forces this check to occur. Once you have reached that limit, you’ll receive the following message: “Maximum mount count reached. Check forced.” It’s standard operating procedure, and you should allow it to run its course.

But what if you run into a problem and you can’t boot your system? Generally, (if these problems are block errors) you’ll be prompted to log in as root and to run a check manually. This warning, however, doesn’t give you all the details! Although the first check that you should run is a standard check (simply typing fsck as root), it’s important that you run an interactive check—just like we did above. An interactive check allows the user (superuser, that is) to answer questions in order to fix problems. By running the standard check fsck, the system will report back if there are any block-type errors that are in need of repair. If there are block errors, the interactive check should take care of them.

Not everyone has access to a tape drive, a CD burner, or a second hard drive and can back up an entire file system. In the event of a major crash (one that renders a system inoperable), the ability to reinstall the OS and plug in critical configuration files would be the next best thing. But which files? Which directories? What’s the best strategy for creating a backup system, where—outside of the OS installation CD—floppies are the primary means of restoration? The first step is to determine what’s crucial. Is your system on a network? Does it talk to other machines? Do you have desktop configurations? Do you have desktop pictures? Does your system hold the latest upgrades?

Regardless of what’s important to your system, let’s look at some of the more common files that you need to back up for emergency purposes. I'll break it into categories: networking, desktop-ing, program-ing, and various-ing.

The network backups should consist of various components, including host files, export files, printsharing files, samba files, and what I'll call rc files. The primary directory that houses these files is the /etc directory. There are two ways in which you can conduct this backup. The first involves the use of the tar command. This command doesn’t compress the file (other tools do that); the file is still incapable of fitting on a floppy. The second method involves picking and choosing which files are necessary in an emergency.

The tar command, the first method, allows the user to take a directory and its contents (including subdirectories) and to pack them into a single file. This command comes in handy when you have to make restorations. For our purposes, I’ll take our /etc directory and pack it into a single file. We’ll compress it and (I hope) fit it onto one floppy. First, log into root and run the command:
tar cvMf /dev/fd0 /etc

Since the M has been added to the above command, the user will be prompted to insert additional floppies when appropriate. The /etc directory will consume approximately three empty floppies. Label those floppies correctly. You want to be sure that you don’t get the floppies out of order when you’re un-tarring the directories.

What’s deceiving about this process is that, once it has completed, you will see nothing if you try to examine the contents of the disks. Instead, when and if you must use them, you simply run:
tar xvf /dev/fd0

If you don’t want to back up the entire /etc directory, you can pick and choose the files that you want to copy. From the /etc directory, you’ll want to make sure that you copy the following files:
  • Exports
  • Hosts
  • hosts.allow
  • hosts.deny
  • hosts.equiv
  • lilo.conf
  • printcap
  • passwd
  • resolv.conf

Within the /etc directory, you’ll want to go into the /rc.d subdirectory and copy rc.local.

Also within the /etc directory lies the /sysconfig subdirectory, which houses a number of files that you should copy, including:
  • Init
  • Mouse
  • soundcard

Most of the above files will be taken care of if a reinstall takes place. The soundcard file, however, will not be taken care of.

Within the /sysconfig subdirectory lies yet another subdirectory called /network-scripts. There, you'll find configuration scripts for network devices, such as modems and Ethernet. Within this directory, you ought to copy the following:
  • ifcfg-eth*
  • ifcfg-ppp*

The “*” symbol denotes a number that’s assigned when the device is configured. You’ll see something like ifcfg-eth0 and ifcfg-ppp0.

Now that you have all of the necessary /etc files backed up, you can move on to the desktop.

The desktop is a bit more complex because of the amazing number of differences among Linux desktops. The many combinations of desktop environments and window managers make creating any sort of definitive list almost impossible. However, I’ll stick to two basic desktops environments: GNOME and KDE.

Backing up the GNOME desktop environment is a bit tricky because of the differences between releases. So, I'll stick with the more common files. The main directory is ~/.gnome. Within this directory, you’ll find the following files that should be backed up (understand that many of these files are purely aesthetic in nature):
  • Background
  • Desktop
  • Gnome
  • Screensaver
  • metadata.db
  • session

Within this directory is a window manager subdirectory called wm-properties. It contains the configurations for your window manager, and it may be different from my example, depending on which release you use. Within my wm-properties subdirectory, I have the following files:
  • AfterStep.desktop
  • Config

The Config file is the file that your GNOME session reads to determine which WM is being used. From this directory, you’ll want to copy every file that relates to your window manager.

KDE makes it a bit easier on the user by placing all the configuration files in a directory called ~/.kde/share/config. To back up KDE, copy all of these files (and there are quite a few) onto a floppy. Along with these files, you’ll want to backup your ~/.kderc file.

Of course, to assure a complete backup, both of these desktop environments can be backed up with the tar method (which is described above). For GNOME, you run the following command (where USERNAME is the user’s login name):
tar cvMf /dev/fd0 /home/USERNAME/.gnome

You’ll be sure to catch all the files that you need. The same holds true for KDE:
tar cvMf /dev/fd0 /home/USERNAME/.kde

No, I'm not talking about C++ or Java. Every installation has various programs that are essential to your computing survival. Each program, in turn, has configuration files that are essential for survival. Bookmarks, setup configurations, and directory listings are vital to making a program’s reinstallation painless.

Naturally, describing which files must be backed up for every program would be impossible. Even describing which files should be backed up for the most common applications would be lengthy, at best. So, what I'll do is list a few rules of thumb that will help you decide what should be backed up.

The Resource Configuration files (rc files) of an application hold crucial user-defined information. Many Linux applications contain these files, which should be backed up. Spotting 'rc' files is very simple. The suspect files will be hidden files (begin with '.'), and they will end with 'rc'.

Many applications (StarOffice 5.1, for example) contain entire directories that house configuration information. Check within the main directory of the application in question. If there’s a configuration directory, run the tar command (as shown above) on the entire config directory and save it to a disk.

Other crucial directories that you should back up are any bin directories that were created by an application. These bin directories often contain .res files (resource binary files) and .bin files (binary files).

Most browsers are universal, and the bookmark files are fairly similar. Linux's Netscape application contains the same bookmarks.html file that its Windows counterpart contains. Is this backup crucial? It depends upon the collection of bookmarks that your browser contains. I have over 70 bookmarks that I'd rather not lose! With a simple cp command, those bookmarks are safe from destruction—unless the floppy on which I save them meets an untimely demise. Keeping with the Internet theme: within the users home directory, such applications as pine will drop .addressbook files that, though not crucial, could save a great deal of time when a restoration of the OS occurs.

To err on the safe side is the best cautionary advice that you can heed when you’re determining backup strategies. Finally, if you see a hidden file in a users home directory, back it up.

It’s an odd term, but there are many other types of files in Linux that should be considered carefully as backup candidates. Some of these files are user specific and shouldn’t be thought of as “global strategies.” However, backing up the following files will make your life a great deal easier when you’re restoring a machine.

Within the users home directory, there are files that the system reads when it starts such things as X Windows or a terminal window. These files, though simple to recreate, can become a hassle for an administrator. (Imagine having to recreate, from memory, aliases for 50+ users!)
  • .Xauthority
  • .Xdefaults
  • .bash_logout
  • .bash_profile
  • .bashrc
  • .xinitrc

The majority of these files either define the user’s X Windows session or the BASH (all aliases and $PATH variables).

This list is by no means an exhaustive backup strategy. It’s merely a guide for preparing a solution-based model. With a minimum of hardware and a maximum of patience, it will save a system. Of course, the best solution would be perfect maintenance, but we know that such a solution is far from possible. Eventually, some disaster will strike, but (let’s hope) you will be prepared.

Jack Wallen, Jr. is very pleased to have joined the TechRepublic staff as editor in chief of Linux content. Jack was thrown out of the "Window" back in 1995, when he grew tired of the "blue screen of death" and realized that "computing does not equal rebooting.”

If you'd like to share your opinion, please post a comment at the bottom of this page or send the editor an e-mail.


Jack Wallen is an award-winning writer for TechRepublic and He’s an avid promoter of open source and the voice of The Android Expert. For more news about Jack Wallen, visit his website

Editor's Picks

Free Newsletters, In your Inbox