Open Source

SolutionBase: Steps to take to fix Linux when it won't start

No OS is 100 percent foolproof. Eventually, even Linux may not boot one day when you want it to. Jack Wallen shows some of the strategies you can take when your Linux workstation fails to start properly.

No matter how much you adore your Linux machine, there will come a time when you will have to rescue your installation. Yes, even a Linux machine could suffer from a disaster: Whether it's because of a corrupt video configuration, a kernel update gone wrong, or a misconfigured init script, it's inevitable. I've seen it happen on a number of occasions — even on my own machines, mostly from corrupt X configurations — and it's frustrating.

The best rescue plan, in my opinion, doesn't have to involve reinstalling. Sometimes the best rescue plan doesn't even involve booting up a rescue disk. This article is going to offer up some tips and tricks on how to avoid failure and help you create the tools you need to recover a dead Linux machine.

Start with the right runlevel

After installing a new Linux system, I immediately take steps to ensure disaster won't strike easily. One of the first steps is to edit the system's runlevel. The runlevel tells the system how far to take the boot process. The runlevel is broken down into six levels:

  • 0: halt (do not set initdefault to this)
  • 1: Single user mode
  • 2: Multiuser, without NFS (the same as 3, if you do not have networking)
  • 3: Full multiuser mode
  • 4: unused
  • 5: X11
  • 6: reboot (do not set initdefault to this)

Newer Linux distributions almost always default to runlevel 5 (X11), which means that your system will stop at the graphical log-in screen when boot is complete. This is fine until something (or someone) hoses your X configuration; you will then have to find a means to log in. You could press [Ctrl][Alt][F7] to get a text-based virtual screen, but why go through that hassle? Instead, I always change my runlevel to 3 in the file /etc/inittab. The line you change is:

id:5:initdefault:

which will change to:

id:3:initdefault:

This is a very simple method of saving yourself when X doesn't work properly.

Multiple kernels

The next obvious rescue aid is to always have a working kernel installed. I usually work from a kernel updated via yum. Kernels have occasionally been released with flaws that have caused one or more of my machines to not boot. To this end, I always make sure I have at least one perfectly running kernel on a machine. A great way to handle this is to first add plugins=1 in your /etc/yum.conf file. The next step is to take this script (written by Jeremy Katz from RedHat) and save it as n-installonly.py in /usr/lib/yum-plugins. You can change the number of kernels to retain on the system by changing the tookeep variable (default = 2).

With a known working kernel on your system, you can upgrade safely. If the new kernel is hosed, simply boot the old kernel to solve the issue with the new kernel (be it to remove it, recompile it, or update it).

Rescue mode

If you are using Red Hat and the LILO boot loader, you can boot into rescue mode by inserting Disk 1 of your installation and entering linux rescue at the boot prompt. Once the machine has booted, you will land on the bash# prompt. From this mode, you have a number of tools to use.

As you can see, there are tools to check the integrity of a hard disk, repair hard disks, check kernel modules, mount devices, and create file systems, etc. This is a very good place to start with your rescue attempt (if you're using a Red Hat, or Red Hat-based, system).

The next rescue method is booting into single-user mode, where your computer boots to runlevel 1. Your local file systems will be mounted, but your network will not be activated. You get a usable system maintenance shell. To boot into single-user mode, enter either:

linux single

or

linux emergency

at the LILO prompt.

Creating a rescue CD

If you're using the LILO boot loader, there's a great tool called mkrescue. This tool is typically used to create boot floppies, but can be used to create ISOs as well. Here's how.

If you're using Mandriva:

As root:

mkrescue —iso —initrd /boot/initrd-KERNEL-NUMBER.img —kernel /boot/vmlinuz-KERNEL-NUMBER

Note: Where KERNEL-NUMBER is the actual release number of the kernel.

If you're unsure what kernel release you're using, the numbers for intrd and vmlinuz can be found with the following command:

uname -r

After running the command, you will find a rescue.iso file in the directory where you ran the mkrescue command. You can now burn the image with the following commands:

First, check for the number of the CD burn device with:

cdrecord -scanbus

Now burn the images with:

cdrecord dev=0,0,0 rescue.iso

Note: Where dev=0,0,0 is number discovered with the scanbus command above.

If you're using Slackware, use these steps to make a boot CD:

mkrescue -iso

Note: Slackware automatically knows what kernel to put in the ISO.

You will then burn the image with the same means you did with the Mandriva image.

SystemRescueCD

SystemRescueCD is a Linux system on a bootable CD-ROM for repairing your system and your data after a crash. It also aims to provide an easy way to carry out admin tasks on your computer, such as creating and editing the partitions of the hard disk. It contains a lot of system utilities (parted, partimage, fstools) and basic utilities (editors, midnight commander, network tools).

It aims to be very easy to use. Just boot from the CD and you can do everything as if you booted from a hard drive. The kernel of the system supports most important file systems (ext2/ext3, reiserfs, reiser4, xfs, jfs, vfat, ntfs, iso9660), and network ones (samba and nfs).

SystemRescueCD is probably the best of all the rescue systems out there. Not only can you use this rescue method from CD, but you can also place the rescue system on a USB flash drive.

To create a SystemRescueCD on a USB flash drive you will need a drive greater than 256 MB. Download the iso image from Sourceforge and burn the image onto CD. Now you will have to create the file systems on the drive. Find the name of the drive using the dmesg command, and then erase the drive with:

dd if=/dev/zero of=/dev/sda

where /dev/sda is the actual name of the drive.

Now install a master boot record on the drive with:

install-mbr /dev/sda

or

install-mbr —force /dev/sda

if the command complains.

Now create the partition with parted by issuing:

parted /dev/sda
(parted) mkpartfs primary fat32 0 100% // use help or help mkpartfs command to see help
(parted) print // check if the write was ok
(parted) quit

Now that the filesystem has been created, copy the files to the flash drive from the CD burned from the SystemRescueCD image. Make sure you copy the files in the same hierarchy as they appear on the CD.

Now make the flash drive bootable with the syslinux command as such:

syslinux /dev/sda1

where /dev/sda1 is the actual name of the drive.

Now you have a rescue thumb drive you can carry with you all day. Hopefully, since you are running Linux, you won't need to use it that often.

Final thoughts

Linux is a very stable environment, but because there are so many systems within the system, things can go wrong. Although it's very easy to become complacent about Linux (due to its numerous strengths), it's always smart to know how to rescue a machine from an untimely demise. Of course, all of the rescue systems in the world will not recover your system 100 percent of the time, so you might want to consider implementing a disaster-recovery program for your Linux servers and desktops.

About

Jack Wallen is an award-winning writer for TechRepublic and Linux.com. He’s an avid promoter of open source and the voice of The Android Expert. For more news about Jack Wallen, visit his website getjackd.net.

Editor's Picks