Tech Tip: Use awk to format output

Many Linux administrators use tools such as cat and grep to view or watch logs and other changeable files that contain data, but there are more robust commands, such as awk, that can perform more precisely formatted file actions. For instance, assume you run the Apache Web server and you've had reports of abuse coming from an IP address that begins with "24.". You could use grep to find all hits coming from that kind of IP address by using:

$ grep '^24\.' /var/log/httpd/access_log

This will output every line in the log file that starts with "24.". However, in the investigation stage, all you might want is a list of the IP addresses; you don't necessarily need to know what files they loaded or what browser they are using. The easiest way to get just the IP addresses from the output would be to use awk:

$ grep '^24\.' /var/log/httpd/access_log|awk '{print $1}'|sort -u

This command uses awk to print out the first field in the output from grep; in this case, the first field is the IP address. Once you have retrieved just the IP addresses using awk, pipe that output into sort. Using the -u option with sort, the list you finally see is sorted and unique; no duplicate IP addresses are listed.

This is perhaps one of the simplest uses for awk, but it's a very handy way of retrieving a particular field from a file. The '{print$1}' component of the command determines which column in the log file to awk; in this case it references the 1 or first column. If the IP address were, consistently, the third field in the file, you would use:

awk '{print $3}'

The awk command is almost a programming language unto itself, and you can build powerful scripts with awk to perform some very specialized operations. However, it's important to remember that you can also use awk to perform some very simple, yet useful, tasks.