Improve your scripting with AWK, part 1: An introduction to the pattern scanning and processing utility

AWK is a very powerful scripting language utility, but what does it do? In this first installment of a three-part series, Richard Charrington provides a general overview of AWK and a description of how to implement AWK under Microsoft Windows.

You can use NT’s built-in scripting language to create some very powerful scripts. As powerful as this scripting language is, however, it’s somewhat limited. If you want to create even more powerful scripts, you must use a third-party program. Let’s take a look at one very good scripting utility that you can use to beef up your scripts: AWK.

What is AWK?
AWK is a very powerful scripting language utility. Its basic purpose is to search an alphanumeric source for patterns and to perform specified actions upon lines—or elements of lines—that contain instances of those patterns. The alphanumeric source may be a file or the output from a command. AWK patterns can include arbitrary Boolean combinations of regular expressions and of relational operators on strings, numbers, fields, variables, and array elements. Actions include the same pattern-matching constructions that patterns do, along with arithmetic and string expressions and assignments; if-else, while, and for statements; and multiple output streams. AWK is a standard feature of UNIX and Linux operating systems, and it’s available as a separate download for other operating systems, including OS/2, Windows 9x, and Windows NT. You can download a free copy of AWK for Windows NT from my Web site.

How does AWK work?
So, AWK is a pattern-matching program. But what does that mean in practice? Consider the following output from the command nbtstat –a {ipaddress}:

NetBIOS Remote Machine Name Table
Name Type Status
ADOMAIN <00> GROUP Registered
ADOMAIN <1C> GROUP Registered

If you use NT Commands, including those supplied in the Reskit, how could you create a batch file that extracts the server name and uses it to view the contents of the C: drive? Of course, you can’t—even with command extensions enabled. With AWK, however, the following commands will do what’s needed:
nbtstat –a Š AWK "NR==7{print s,substr($1,1,15)}" s="set Server=" > temp.bat & call temp.bat !!
dir \\%Server%\C$ !!Since some of the sample lines of code can extend over two or three lines, I will use a double shriek (!!) to indicate the end of a long line. When you try these examples, do not include the double shrieks.
If you already had the IP address, you could just type dir \\\c$, but bear with me. The above AWK command looks complicated at first, but it’s really quite simple. It takes the seventh line (NR==7; try capturing the output from nbtstat and checking for yourself) from the output of the nbtstat command and prints set Server= and the first 15 characters of the first word (substr($1,1,15)), which is the name of the server. Then, it redirects this output to a batch file (> temp.bat) and runs that batch file (& call temp.bat), thus setting the environmental parameter Server to the server name. Now, I’ll describe the implementation of AWK under Microsoft Windows. If you learn by trial and error like I do, feel free to skip to the end and try out some of the working examples.

General overview
AWK runs its code against each line of a file or command output. The complete line can be referenced in the code by the parameter $0. Each line is split into words by the default separator of space (which can be changed). Each word is referenced in the code by the parameters $1, $2, $3, etc. The number that follows $ can be an expression (e.g., $(a+b)), and statements can be grouped together in braces ({……}). Code can be written to a file, which is then passed to the AWK command as a parameter, but code also can be built into a string on the command line (as in the above example). The input that AWK processes can be piped from another command (such as echo, date /t, or any other command that produces an output), or the input can be text that’s contained in a file or files.

Command line
AWK can be called with a file name as the first parameter. This file contains the AWK code that will be executed on the input to AWK. For example:
AWK codefile.AWK *.tab
This command executes the code that’s in codefile.AWK on the contents of any files in the current directory with the extension .tab. The .AWK extension is discretionary; it can be anything—or nothing. However, if there are two files with the same name (one with the extension AWK and one with no extension), AWK will use the one that’s specified on the command line. If the code file has the extension .AWK and there isn’t another file with the same name and no extension, the extension can be omitted. For example, if filename.AWK and filename are in the current directory, the command AWK filename *.tab will use filename, and the command AWK filename.AWK *.tab will use filename.AWK. If only filename.AWK is in the current directory, on the other hand, the command AWK filename *.tab will use filename.AWK.

In-line code
The most convenient way of using AWK is to include the code as a parameter on the command line, as in:
AWK "{print $1}" *.doc
This command prints the first space-separated word on each line of each file that’s found with the extension .doc. It’s the most efficient method of using AWK in batch files because there isn’t another file to remember. However, there are limitations, including the length of the command line—both in DOS and as it’s passed to the AWK program. When the latter limit is exceeded, the program will report that it’s unable to execute AWK. On the command line, you can specify a word separator. The default word separator is a space. When you’re working with CSV files, however, you may want to choose the comma as the word separator, which can be done as follows:
AWK -F, "{print $1}" *.csv
This command prints the first comma-separated word on each line of each file that’s found with the extension .csv.

Quotation marks within quotation marks?
Where the code is built into the command line, it is enclosed within quotation marks. This situation presents a problem when you need to include literal text that contains spaces in the output. Such text must be enclosed within quotation marks, but you can’t use quotation marks within quotation marks. Unfortunately, you can’t use single quotes, either. The solution is to set variables (they are actually literals, but the value with which they start can be changed within the code) on the command line. For example:
Echo This is a test Š AWK "{print $1,etc}" etc=" of passing literals to AWK" !!
This command will output the following:
This is a test of passing literals to AWK
The trouble with backslash
The backslash character requires special handling. For example, given a server name and share, how would you build the UNC "\\server1\Cdrive"? The obvious solution is as follows:
Echo server1 Cdrive Š AWK "{print bs,bs,$1,bs,$2 }" bs="\" !!
However, this command will output
server1 Cdrive
The backslash has been ignored entirely. You should note that a comma between variables in the print instruction produced a space in the output. Also note that the variable bs produces no output. Using two backslashes doesn’t work, either; it produces the same output. Removing the quotes in the above example produces the correct output, though.
Echo server1 Cdrive Š AWK "{print bs bs $1 bs $2 }" bs=\ !!
This command will output
Separating the variables in a print instruction by a space actually concatenates them, but using a comma separates each value in the output with the Output Field Separator (OFS). The default separator is a space. When a literal ends in a backslash, you shouldn’t use quotes.
Echo server1 Cdrive Š AWK "{print bs,$1,bs,$2 }" bs="\xxx\" !!
This command will output
server1 Cdrive
Now, type:
Echo server1 Cdrive Š AWK "{print bs,$1,bs,$2 }" bs=\xxx\ !!
This command will correctly output
\xxx\ server1 \xxx\ Cdrive
However, this method can’t be used for a string that contains a space.
Echo server1 Cdrive Š AWK "{print bs,$1,bs,$2 }" bs=\xx x\ !!
This command line will produce the following error:
AWK: can’t find file x\
If you need to use more than one instruction in the code, each instruction must end with a semicolon.
Echo This is the input Š AWK "{print $1,$2; print $4}" !!
This command will output the following:
This is
Try not to go overboard with the amount of coding that you include on the command line, or you’ll eventually run into problems. Either AWK will fail to run because there’s a limit to the number of characters that can be passed to the command, or you’ll hit the command line length limit for NT. Usually, you’ll hit AWK's limit first. If AWK fails but you’re absolutely sure that your code is correct, you should remove some characters and trying again. If you run into line-length problems, you can often get away with removing all unnecessary spaces. For serious processing, you ought to put the AWK code into a file. The structure of such a file would be:
The BEGIN and END groups are optional, as are the /pattern/ structure and the group of statements that follow it. Thus, legally, you could have an empty script, but it would be rather pointless. On Wednesday, I’ll cover the language structure of AWK.

Richard Charrington’s computer career began when he started working with PCs—back when they were known as microcomputers. Starting as a programmer, he worked his way up to the lofty heights of a Windows NT systems administrator, and he has done just about everything in between. Richard has been working with Windows since before it had a proper GUI and with Windows NT since it was LANManager. Now a contractor, he has slipped into script writing for Windows NT and has built some very useful auto-admin utilities.

The authors and editors have taken care in preparation of the content contained herein, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for any damages. Always have a verified backup before making any changes.

Editor's Picks

Free Newsletters, In your Inbox