Improve your scripting with AWK, part 3: Pre- and post-processing

Richard Charrington concludes his three-part series on AWK, a very powerful scripting language utility. This time, Richard discusses pre- and post-processing and pattern matching. He also provides a few extended examples of how you can use AWK.

In “Improve your scripting with AWK, part 1: An introduction to the pattern scanning and processing utility,” I presented an overview of how AWK works, and I demonstrated how you can use AWK code on the command line or in a file. I also discussed some of its problems and pitfalls. In “Improve your scripting with AWK, part 2: The language,” I described the commands and statements that are available with AWK, the constants that you’ll find, and the ways in which you can selectively process the input and format the output. Today, I’ll conclude this series with a discussion of pre- and post-processing and pattern matching, and I’ll give you a few extended examples of how you can use AWK.

Before and after
There will be times when you’ll want to do some processing or print some output before you start dealing with the source. Or you may want to do something after the entire source has been processed. You can achieve these tasks by using the BEGIN{…} and END{…} constructs, respectively. To print a heading before the output and a footer that shows the number of lines that were processed, type:
AWK "BEGIN{print header}{lno=lno+1;print lno,tab,$0}END{printf f,lno}" header="This is the heading" tab=": " f="There were %s lines processed"

Please note that the above command is a single line of code. It involves the automatic initialization of a variable. The variable lno was considered equal to zero when it was first used. The same is true for all variables that are used in AWK; the initial value of a variable is either null (when used in a string expression) or zero (when used in a numeric expression).

Regular expressions
Regular expressions in AWK include parentheses for grouping, Š for alternatives, + for "one or more," and ? for "zero or one." Character classes may be abbreviated: [a-zA-Z0-9] is the set of all letters and digits. Take the following example:
AWK "/[Aa]ndyŠ[Ww]illiamŠ[Kk]eith/"

It will print all lines that contain the names Andy, William, or Keith—whether capitalized or not. Other characters that have special meanings include:
  • [..]: Contains a list or range of characters (e.g., [abxy] matches any of the characters that appear between the brackets; [a-d] matches any character from a to d)
  • ^: Begins with (e.g., /^[Aa]/ matches any line that begins with A or a; when the line begins with the first character that appears between the brackets, it means “any character not in the list”)
  • $: Ends with (e.g.. /$[Aa]/ matches any line that ends with A or a), but it doesn’t work in the NT version of AWK
  • *: 0 or more sequences of the preceding character or expression (e.g., /An*t/ matches any line that contains A, followed by 0 or more appearances of n, followed by t; it would match at and ant, but not art)
  • +: 1 or more sequences of the preceding character or expression
  • ?: 0 or 1 sequences of the preceding character or expression
  • .: Any single character (e.g., /a….t/ matches any line that contains an a, followed by any 4 characters, followed by t)

Regular expressions must be enclosed by slashes. Within a regular expression, blanks and the regular expression meta-characters have special meaning. To turn off the magic meaning of one of the regular expression characters, precede it with a backslash. For example:

This line matches any string of characters that are enclosed by slashes. With the operators ~ and !~, you can specify that any field or variable matches (or doesn’t match) a regular expression. For example:
$1 ~ /[jJ]ohn/

This program prints all lines in which the first field matches john or John. Of course, it also matches Johnson, St. Johnsbury, etc. To restrict it just to John, you might try to use the following line:
$1 ~ /^[jJ]ohn$/

Unfortunately, as I mentioned earlier, the $ qualifier won’t work.

Input and output
You can pipe the result of a command to AWK, or on the command line, you can specify a file that will provide the input. You know that print and printf will produce output, and you know that you can use the > character on the command line to send the output to a file. Did you know, however, that there are ways in which AWK can dynamically alter where the input is taken from or where the output is sent?

The function getline can define where the next line of input will come from. The structure of the command is as follows:
  • getline: Reads the next record into $0 (that way, you can work on more than one line at a time)
  • getline s: Reads the next record into s
  • getline <"file": Reads a record from "file" into $0, thus changing the input source temporarily
  • getline s <"file": Reads a record from "file" into s
  • getline: Returns -1 if there is an error (such as non existent file) and 0 on end of file; otherwise, returns 1

The print and printf instructions can include the > and >> characters, as in the following lines:
  • print > "file"
  • printf > "file"
  • print >> "file"
  • printf >> "file"

Input and output can use the same file—provided that, before you use it for input, it’s been closed after output and that, before you use it for output, it’s been closed after input.

And now for the examples
Now that you understand the theory, you’re ready for some examples of how AWK can be used in real-life situations. Some of the following scripts get quite complicated, so be prepared.
All of the following examples are written as batch files. Where an AWK script is used, it’s listed below the batch file detail.
Example 1
You can use AWK to create a list of server names from the result of a command that has other textual information in the output. Here’s how:
:: Check all servers in a domain are 'live'
:: First, pick out the server names
browstat vw \device\netbt_e100b1 Š awk "{print $1}" > members.dom !!
:: For each entry in the output file call a
:: subroutine
for /f %%i in (members.dom) do call :Subroutine %%i !!
goto :EOF

:: The following subroutine pings the server for
:: a maximum of 10 times
set OK=no
set x=0
set /a x=x+1
ping %1 –n 1 | find "Reply from" && set OK=yes
:: End if 10 pings sent
if %x%==10 goto EndLoop
:: End if server replied
if %OK%==yes goto EndLoop
goto Loop
:: If server did not reply, say so
if %OK%==no echo %1 cannot be reached
goto :EOF

Example 2
You can use AWK to set an environment parameter. Here’s how:
:: Use the output from netdom command to get the
:: name of the PDC then checks to see if it
:: is the server this batch file is running on
netdom master Š awk "/PDC \\\\/{print c $3}" c="set pdc=" > tmp.bat & call tmp.bat !!
if %pdc%==\\%computername% goto Cont

Example 3
You can use an AWK script to perform more complicated processing. First, here’s the batch file:
:: Get the time from the DHCP backup file
:: The AWK script below adds 20 minutes to
:: the file creation time and run the 'at'
:: command
dir c:\winnt\system32\dhcp\backup\DhcpCfg /l | awk "/dhcpcfg/{print $2}" | awk TIME.awk time=20 f="at %%02d:%%02d c:\dhcpScripts\at-dhcp.bat" !!

Now, here's the AWK script (TIME.awk):
# —> separate the hours and minutes

# —> get the 'a' or 'p' (for am/pm) off the end of the minutes
apm = substr(s[2],3,1);

# —> remove the am/pm letter from minutes and add the increment
hr = s[1];
min = substr(s[2],1,2) + time;

# —> if the hour is not 12, add 12 to get the 24 hour time
if(apm == "a" && hr == 12) hr = 0;
if(apm == "p" && hr < 12) hr += 12;

# —> if our minutes have exceeded 59 by adding the
# —> increment, subtract 60 and increment the hour
while (min > 59){
min -= 60;

# —> make sure we haven't exceeded 24 hours
if (hr >= 24) hr -= 24;
# —> format the 'at' command
# —> f = "at %02d:%02d c:\batch\dhcp\at-dhcp.bat"
c = sprintf(f,hr,min);
# for debugging, display the original time and
# the resulting command
print $0 " -> " c;
# —> execute the command

Of course, if your system is configured to show time stamps in a 24-hour format, you’ll have to adjust the above script accordingly.

Example 4
You can use AWK to display how long ago (in hours) a file was created. First, here are the batch commands:
echo. | date /t | awk "{print $2}" > btemp.txt
echo. | time /t | awk "{print $1}" >> btemp.txt
dir \\%1.10\d$\runme.bat | awk "/\//{print $1,$2}" >> btemp.txt !!
awk instdt.awk btemp.txt > setit.bat & call setit.bat

Now, here’s the AWK script (instdt.awk):
#print "echo date: " td "dy " tm "mth " th "h " tmn "m";
#print "echo File: " fd "dy " fm "mth " fh "h " fmn "m";
if(fm >= 10)
ddiff=td-fd + mdiff*30;
hdiff=th-fh + ddiff*24;
#print hdiff;
if(hdiff > 12)
print "set diff=" hdiff;

Of course, there are many other tasks that you can perform with AWK. I only covered a few of the more basic scripts. Now that you have an understanding of what AWK is and what it can do for you, however, you should be able to come up with a few ideas of your own. If you want to obtain a copy of AWK, you can download it from my Web site.

Richard Charrington’s computer career began when he started working with PCs—back when they were known as microcomputers. Starting as a programmer, he worked his way up to the lofty heights of a Windows NT systems administrator, and he has done just about everything in between. Richard has been working with Windows since before it had a proper GUI and with Windows NT since it was LANManager. Now a contractor, he has slipped into script writing for Windows NT and has built some very useful auto-admin utilities.

The authors and editors have taken care in preparation of the content contained herein, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for any damages. Always have a verified backup before making any changes.

Editor's Picks

Free Newsletters, In your Inbox