Open Source

Extract text with awk

Like sed, awk can be used to transform text. Awk is both a general purpose text transformation tool and a programming language in its own right. Awk is especially useful in scripts and on the command-line.

The best way to illustrate the power of awk is with examples, so let's go:

$ printf "line one\nline two\n" | awk '{print $2, $1}'

one line

two line

The above transposes two words on a line. To awk, each string separated by white space is turned into a variable to use; the first is assigned to $1, the second to $2, etc. So in the above, it takes "line one" and turns it into "one line" by ordering the variables to print accordingly. Note that if you use print $2 $1 instead of print $2, $1 — the two fields will be placed together, such as oneline.

You can also use awk to count the number of occurrences of lines containing a pattern, for instance:

$ printf "line one\nline two\nline three" | awk '/line/ { ++x } END { print x }'


$ printf "line one\nline two\nline three" | awk '/t/ { ++x } END { print x }'


This is taking the output of the printf statement and awk is, in the first instance, looking for occurrences of the string line. It increments the variable x for every occurrence found and at the end of processing, prints the value of x. In the first instance, line was found on three lines; in the second, the string t was found on only two lines.

A more practical example: Suppose the program badprog routinely is causing problems on the system, but it needs to run nevertheless. However, once the system reaches a load average of 4.00, you want to kill the program and restart it to prevent it from hogging all the resources:


if [ "`cat /proc/loadavg | awk '$1 > 4 {print $1}'`" ]; then

    pid="`ps ax | grep badprog | grep -v grep | awk '{print $1}'`"

    for x in ${pid}; do

        kill -9 ${x}


    /usr/bin/badprog &


As you can see, awk is used twice — the first time to print the load average only if it is greater than 4.00 and second, to grab the first column of the output of ps, which is the pid. The script then iterates through all the pids that match, killing each one, then finally restarts and returns the program to the background.

Awk is very powerful, and there is a lot that can be done with it. The examples above illustrate some of that power, but it's worth exploring awk to see all that it can offer.

Additional awk resources:


Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.

Editor's Picks