In Linux, instances of currently running programs are referred to as processes. When you start Apache for example, it is assigned a process ID. This ID is then used to monitor and control the program.

Monitoring and controlling processes is a core responsibility of any Linux system administrator. An admin can stop (“kill”) a process, restart it, or even assign it a different priority. The standard Linux commands ps and top are commonly used to look at the current process table. I’m going to show you how to use these and other commands to manage processes on a Linux system.

Monitoring processes with ps
One of the standard tools for monitoring Linux processes is ps, which is short for process status. This command returns information on running programs. The information can include the username a program is running under, the amount of CPU it is using, and the length of time it has been running. This data can be valuable when you need to manually stop a program or if you just need to determine what program is slowing down the system.

If you issue the ps command alone, it will list only processes that are running on the current terminal. Here is an example output of ps run from a remote shell:
$ ps
  PID TTY         TIME CMD
 4684 pts/14   00:00:00 bash
27107 pts/14   00:00:00 ps

Currently, the only processes assigned to this user/terminal are the Bash shell and the ps command itself. You can see the PID (Process ID) listed for each one as well as the TTY, TIME, and CMD. TTY denotes which terminal the process is running on, TIME shows how much CPU time the process has used, and CMD is the name of the command that started the process.

As you can see, a standard ps command really just lists the basics. To get more details about the processes running on your Linux system, you will need to pass some command line arguments.

Passing ps the commonly used aux arguments will display processes started by other users (a), processes with no terminal or one different from yours (x), and the user who started the process and when it began (u).

Listing A shows an example of what the output of ps aux might look like.

There is a lot more information now. The fields USER, %CPU, %MEM, VSZ, RSS, STAT, and START have been added. Let’s take a quick look at what this tells you.

First, you now see all processes, not just the ones running on your terminal. The USER field shows you which user initiated the command. Many processes begin at system start time and often list root or some system account as the USER. Other processes are, of course, run by individuals. That information alone could help narrow down a problem. Say a user begins a script that eats up a lot of I/O on a production server. Being able to immediately tell who ran the program can speed up the time to resolution.

The %CPU, %MEM, VSZ, and RSS fields all deal with system resources. First, you can see what percentage of the CPU the process is currently utilizing. This information is shown in real time, so spikes can be harder to detect with ps. You may find yourself running ps commands rather frequently trying to catch a culprit process.

Along with CPU utilization, you can see current memory utilization and its VSZ (virtual memory size) and RSS (resident set size). VSZ is the amount of memory the program would take up if it were all in memory; RSS is the actual amount currently in memory. Knowing how much a process is currently eating will help determine if it is acting normally or has spun out of control. Programs have a tendency to consume more memory and CPU than they should. While programmers work hard to make sure their code handles resources well, sometimes it is up to an administrator to decide if it needs to be stopped or restarted.

You will notice a “?” in most of the TTY fields in the ps aux output. This is because most of these programs were started at boot time and/or by init scripts. The controlling terminal does not exist for these processes; thus, the question mark. On the other hand, the command linux-sanity-check has a TTY value of pts/14. This is a command being run from a remote connection and has a terminal associated with it. This information is helpful for when you have more than one connection open to a machine and want to determine which window a command is running in.

STAT shows the current status of a process. In our example, many are sleeping, indicated by an S in the STAT field. This simply means that they are waiting on something. It could be user input or the availability of system resources. The linux-sanity-check, however, has a status of R, meaning it is currently running. Sometimes, you can glance through this list and focus on the R processes. If most processes are sleeping and there is some sort of problem, it can be best to focus on those currently running. That status isn’t necessarily a bad sign, but sometimes a process that has been running overly long is an indication of some deeper issue.

Monitoring processes with top
Another good program to get familiar with is top. This program is similar to ps but is usually started full screen and updates continuously with process information. This can help with programs that may infrequently cause problems and can be hard to see with ps. Overall system information is also presented, which makes a nice place to start looking for problems. Information such as total system CPU and memory resources, as well as the load average, is helpful by itself. Add to this a list of programs and their current status and individual statistics, and you can see why top is a commonly used tool.

Don’t forget pstree
And finally, another quick and easy tool for checking processes is pstree. This command will list current processes and their tree structure. When one process starts, it sometimes creates child processes of itself. You can easily see this when you run the command pstree:
 
$ pstree -cp 125
httpd(125)-+-httpd(126)
                   |-httpd(127)
                    | -httpd(129)
                    `-httpd(130)
 

Httpd is a good example because it will often spawn child processes. Here, we’re looking at the tree for PID 125. If you need to stop httpd but don’t want to kill all the individual children, go for the parent process. The pstree command can list trees for individual processes or all the processes on the system. Not only can this help you track down misbehaving processes, but it is also useful as a learning tool. You can learn a lot about Linux executing these commands and reading the associated man pages.

Managing processes
Once you have used tools such as ps and top to monitor processes, you need to know how to manage them. You can do this with commands such as kill, killall, and renice.

kill sends signals to running processes. The most obvious usage is to halt program execution. You will first need to get the PID for a running program (using ps aux, for example) and then use it in the command as follows:
$ kill 125

Under normal circumstances, this should stop the process 125. Also note that you will either need to be the owner of the process or root to halt it. And sometimes, a process will not respond to just a simple kill command. You may have to try the following:
$ kill -9 125

If the process is hung and not responding normally, you can try killing it with the -9 flag, as shown in the above example. Instead of sending a sigterm, as a normal kill command does, the -9 sends a sigkill (which forces the program to close). Other signals can be sent to either stop it or possibly restart it. You can see these by running the command kill –l.

The command killall, while very much like kill, accepts arguments differently. Instead of passing it a PID, you can pass it a program name. All processes running with that program name will then be stopped. This applies to just the ones you own or to all of them if you are the root user. So running the command killall tcpdump will kill all instances of the program tcpdump. This is much more helpful when many processes are running under a single name.

Be sure to watch what processes you stop, especially when you are root. Killing the wrong process could close your session or even halt the system. Get familiar with the standard running programs and how they use resources. Setting a baseline is invaluable in helping to isolate system problems.

Remember when I mentioned earlier that you could change the priority of a running process? Well you can do this with the command renice. Changing priority tells the operating system to give a particular process more or less of its CPU time. The range of a process’s “niceness” is from -20 to 20, with -20 being the highest priority. So to reduce the priority of httpd process 125, you could run:
$ renice +20 125

You can do this on the fly to conserve system resources. The system will also do this naturally and can indicate if one program taking up more resources than it should.

Speedier problem solving
The ability to monitor and control processes on your Linux system is essential. Programs such as ps, top, kill, and renice enable you to see what a process is doing and to control it. The more you know about what each process is up to, the easier it will be to pinpoint problems when they creep in. A system usually experiences problems such as slowness or instability for a reason, and using these tools should help you improve your ability to track down the causes.