Linux

SolutionBase: Steps to take to fix Linux when it won't start

No OS is 100 percent foolproof. Eventually, even Linux may not boot one day when you want it to. Jack Wallen shows some of the strategies you can take when your Linux workstation fails to start properly.

No matter how much you adore your Linux machine, there will come a time when you will have to rescue your installation. Yes, even a Linux machine could suffer from a disaster: Whether it's because of a corrupt video configuration, a kernel update gone wrong, or a misconfigured init script, it's inevitable. I've seen it happen on a number of occasions -- even on my own machines, mostly from corrupt X configurations -- and it's frustrating.

The best rescue plan, in my opinion, doesn't have to involve reinstalling. Sometimes the best rescue plan doesn't even involve booting up a rescue disk. This article is going to offer up some tips and tricks on how to avoid failure and help you create the tools you need to recover a dead Linux machine.

Start with the right runlevel

After installing a new Linux system, I immediately take steps to ensure disaster won't strike easily. One of the first steps is to edit the system's runlevel. The runlevel tells the system how far to take the boot process. The runlevel is broken down into six levels:

  • 0: halt (do not set initdefault to this)
  • 1: Single user mode
  • 2: Multiuser, without NFS (the same as 3, if you do not have networking)
  • 3: Full multiuser mode
  • 4: unused
  • 5: X11
  • 6: reboot (do not set initdefault to this)

Newer Linux distributions almost always default to runlevel 5 (X11), which means that your system will stop at the graphical log-in screen when boot is complete. This is fine until something (or someone) hoses your X configuration; you will then have to find a means to log in. You could press [Ctrl][Alt][F7] to get a text-based virtual screen, but why go through that hassle? Instead, I always change my runlevel to 3 in the file /etc/inittab. The line you change is:

id:5:initdefault:

which will change to:

id:3:initdefault:

This is a very simple method of saving yourself when X doesn't work properly.

Multiple kernels

The next obvious rescue aid is to always have a working kernel installed. I usually work from a kernel updated via yum. Kernels have occasionally been released with flaws that have caused one or more of my machines to not boot. To this end, I always make sure I have at least one perfectly running kernel on a machine. A great way to handle this is to first add plugins=1 in your /etc/yum.conf file. The next step is to take this script (written by Jeremy Katz from RedHat) and save it as n-installonly.py in /usr/lib/yum-plugins. You can change the number of kernels to retain on the system by changing the tookeep variable (default = 2).

With a known working kernel on your system, you can upgrade safely. If the new kernel is hosed, simply boot the old kernel to solve the issue with the new kernel (be it to remove it, recompile it, or update it).

Rescue mode

If you are using Red Hat and the LILO boot loader, you can boot into rescue mode by inserting Disk 1 of your installation and entering linux rescue at the boot prompt. Once the machine has booted, you will land on the bash# prompt. From this mode, you have a number of tools to use.

As you can see, there are tools to check the integrity of a hard disk, repair hard disks, check kernel modules, mount devices, and create file systems, etc. This is a very good place to start with your rescue attempt (if you're using a Red Hat, or Red Hat-based, system).

The next rescue method is booting into single-user mode, where your computer boots to runlevel 1. Your local file systems will be mounted, but your network will not be activated. You get a usable system maintenance shell. To boot into single-user mode, enter either:

linux single

or

linux emergency

at the LILO prompt.

Creating a rescue CD

If you're using the LILO boot loader, there's a great tool called mkrescue. This tool is typically used to create boot floppies, but can be used to create ISOs as well. Here's how.

If you're using Mandriva:

As root:

mkrescue --iso --initrd /boot/initrd-KERNEL-NUMBER.img --kernel /boot/vmlinuz-KERNEL-NUMBER

Note: Where KERNEL-NUMBER is the actual release number of the kernel.

If you're unsure what kernel release you're using, the numbers for intrd and vmlinuz can be found with the following command:

uname -r

After running the command, you will find a rescue.iso file in the directory where you ran the mkrescue command. You can now burn the image with the following commands:

First, check for the number of the CD burn device with:

cdrecord -scanbus

Now burn the images with:

cdrecord dev=0,0,0 rescue.iso

Note: Where dev=0,0,0 is number discovered with the scanbus command above.

If you're using Slackware, use these steps to make a boot CD:

mkrescue -iso

Note: Slackware automatically knows what kernel to put in the ISO.

You will then burn the image with the same means you did with the Mandriva image.

SystemRescueCD

SystemRescueCD is a Linux system on a bootable CD-ROM for repairing your system and your data after a crash. It also aims to provide an easy way to carry out admin tasks on your computer, such as creating and editing the partitions of the hard disk. It contains a lot of system utilities (parted, partimage, fstools) and basic utilities (editors, midnight commander, network tools).

It aims to be very easy to use. Just boot from the CD and you can do everything as if you booted from a hard drive. The kernel of the system supports most important file systems (ext2/ext3, reiserfs, reiser4, xfs, jfs, vfat, ntfs, iso9660), and network ones (samba and nfs).

SystemRescueCD is probably the best of all the rescue systems out there. Not only can you use this rescue method from CD, but you can also place the rescue system on a USB flash drive.

To create a SystemRescueCD on a USB flash drive you will need a drive greater than 256 MB. Download the iso image from Sourceforge and burn the image onto CD. Now you will have to create the file systems on the drive. Find the name of the drive using the dmesg command, and then erase the drive with:

dd if=/dev/zero of=/dev/sda

where /dev/sda is the actual name of the drive.

Now install a master boot record on the drive with:

install-mbr /dev/sda

or

install-mbr --force /dev/sda

if the command complains.

Now create the partition with parted by issuing:

parted /dev/sda
(parted) mkpartfs primary fat32 0 100% // use help or help mkpartfs command to see help
(parted) print // check if the write was ok
(parted) quit

Now that the filesystem has been created, copy the files to the flash drive from the CD burned from the SystemRescueCD image. Make sure you copy the files in the same hierarchy as they appear on the CD.

Now make the flash drive bootable with the syslinux command as such:

syslinux /dev/sda1

where /dev/sda1 is the actual name of the drive.

Now you have a rescue thumb drive you can carry with you all day. Hopefully, since you are running Linux, you won't need to use it that often.

Final thoughts

Linux is a very stable environment, but because there are so many systems within the system, things can go wrong. Although it's very easy to become complacent about Linux (due to its numerous strengths), it's always smart to know how to rescue a machine from an untimely demise. Of course, all of the rescue systems in the world will not recover your system 100 percent of the time, so you might want to consider implementing a disaster-recovery program for your Linux servers and desktops.

About

Jack Wallen is an award-winning writer for TechRepublic and Linux.com. He’s an avid promoter of open source and the voice of The Android Expert. For more news about Jack Wallen, visit his website getjackd.net.

17 comments
Timothy J. Bruce
Timothy J. Bruce

Having a recovery CD (like Ubuntu, Knoppix, grml, or System Rescue CD) handy is necessary as well. That is how you can get back into a system and update grub. Remember, if you have physical access to the server, it is no longer secure. This is because anyone with a bootable Recovery CD can now update the system just as well as you. Tim

ppuru
ppuru

Some notes ... 1. ---%

TheGooch1
TheGooch1

If your Linux installation doesn't work, just install Windows. As long as you used FAT32 for your Linux partition, you can access your data files and migrate them to a more widely supported OS.

bruceslog
bruceslog

The article appears to be mis-labeled. This article wasn't so much about "Steps to take to fix Linux when it won't start", as titled. But rather "Things to do while Linux is running well to prepare yourself for the day when Linux won't start".

jmgarvin
jmgarvin

Oh and don't forget to how to reinstall grub: Boot with in rescue mode chroot /mnt/sysimage grub > root(hd0,0) or whatever partition your boot is on...it'll yell at you if you're wrong and tell you something along the lines of ext partition bla ok > setup(hd0)

jdumont
jdumont

Yum? "What's that" one may ask. You journalists need to specify (preferrably at the start) what flavor of linux you are writing about. I use puppy linux and I got no Yum. Why not just head over to partimage dot org

roger_b
roger_b

I think it's a good general guide overall. However, it's definitely written for somebody using RedHat or Fedora (or one of its offshoots). This probably wouldn't bother somebody who's had a decent amount of experience with Linux, some newer users might want something a little more general that might work in Ubuntu or Suse.

ergodic
ergodic

Please note. It is critical that you foolow: "Find the name of the drive using the dmesg command" step in the article, Fedora-7 labels all disk partitions sdx rather than hdx. Thus, the first hard drive labeled hda in most distros is labeled sda in Fedora-7. If you inadvertently execute the recommended commads verbatim you may damage damage your system.

rcugini
rcugini

YUM is awesome !! You can always find it online for most distros. Use synaptic or YaST to find it. You can always go to the command line running as root. Then try entering: apt-get install yum Use it often. Learn everything about it.

noseda
noseda

What absolute rubbish. First of all, you never install Linux on a Fat 32 partition. This antiquated file system is no longer good for much of anything. A more widely supported OS? With a tiny bunch of people tucked away in Redmond working on the code, as opposed to the thousands all round the world supporting Linux? Besides, you've missed the point entirely. Linux is more stable to begin with. You should take back control of your computer - get Linux.

jdclyde
jdclyde

after you get done installing windows, how much of your data have you just over written? Funny, how quickly you went from a smartass to a dumbass? :D

jmgarvin
jmgarvin

The old school PATA drives are still hdax

rcugini
rcugini

Most versions of Linux will not use FAT32 as a file system. Puppy Linux might use Vfat and maybe FAT32. Most will not. Use your installation DVD or CD to rescue any malfunctioning machines. SuSE has upgrade, rescue or new install as options when you run its DVD. Simply Mepis 6.5 uses a live CD that can usually get online by itself or that will fix other Mepis installs on hard drives. The most common reason for Linux not booting is installing Windows after installing Linux first. If this happens, navigate to the part of the rescue process where you get to reinstall GRUB or LILO. Then reinstall whatever bootloader worked before Windows. Most just use GRUB. I would not try to install Linux on a FAT32 or NTFS partition. The best file system for Linux is ReiserFS. All versions of Linux that I use are installed on this. Currently, I'm dual booting SuSE 10.2 with Vista on a laptop and XP with Mepis 6.5 on my old Dell using a SATA 2 hard drive with a promise card.

TheGooch1
TheGooch1

Minus any boot sector files, no Linux data will be overwritten. That is the point of using a FAT partition. Since I've already done this, there's no argument. It works.

oz_ollie
oz_ollie

sdx indicates the drive is a serial drive so SCSI, USB and SATA drives all show as sd with "sda" the first drive, "sdb" the second drive and so on.

Dumphrey
Dumphrey

Maybe, but in my experience, all the major players do...Suse, Fedora, Gentoo, Debian, Puppy, DSL, Knoppix, Mint, PCLinux, PCBSD...the list goes on. Tested with a 8GB falsh drive formatted in fat32 (formated after purchase with Windows Disk Manager.) "I would not try to install Linux on a FAT32 or NTFS partition." Could not agree more with this. And though Reisers is fast, there are other systems that work better in different situations. I would not bother using reisers on a boot partition, and would not use reisers on a partition for large video/image files, I would use XFS instead. But reisers is a good overall balanced system, and if I had to pick only one, that would be it.

jmgarvin
jmgarvin

I assumed it would be a desktop computer with 1 internal drive and I haven't seen a SCSI laden desktop in a while ;-) I totally spaced USB drives though....Good thought!