Perl is everywhere. Most UNIX-like systems come with it, and those that don’t have it hurriedly installed. Mac OS X comes with it, and any NT administrator worth his or her salt installs it straight away.
Growing availability is making Perl an increasingly popular choice for system scripting and routine text processing. This is especially the case in the UNIX world, where for most administrators and users, the hammering of ASCII text is a near daily task.
However, the same users who would write a three-line Perl script for a task they intend to run frequently stoop to the use of old standbys like cut, wc, and sed when working on the command line. This need not be the case. Perl provides some extremely convenient command-line functionality that can help make UNIX command-line tools a thing of the past. The discussion below assumes a UNIX bourne, ksh, or csh command shell. All of the Perl examples are portable to NT but will require different escaping.
It all begins with the -e option, which tells the Perl executable that the next argument is a Perl statement to be compiled and run. The easiest use is as a clumsy replacement for the echo command:
perl –e ‘print “Hello world”;’
Notice how single quotes were necessary to group the subsequent text into a single argument. Double quotes could have been used as well, but then the double quotes inside the command would have required additional escaping. The output of that command is, predictably, the string Hello world.
Admittedly, running a single Perl command once isn’t terribly useful, although even one line can have great utility given Perl’s feature-rich command set. Where Perl really begins to shine is with the addition of the -p option, which tells Perl to run the -e command once for every line of standard input provided. For each of these runs, the value of the line being processed is stored in the traditional $_ variable. After each line is filtered through the -e command, it is printed to standard output. The output is omitted if the option -n is used in lieu of-p.
The most immediately obvious use for this new option is as a sed-style search and replace tool:
perl –p –e ‘s/old/new/g;’
That command will replace all instances of the string old in standard input with the string new and will send the results to standard output. To run that search and replace on a text file, we’ll use a pipe and file redirection like this:
cat myfile | perl –p –e ‘s/old/new/g;’ > newfile
The modified file contents can now be found in the file named newfile.
Thus far, we haven’t seen anything that couldn’t be done as easily using classic UNIX command-line tools. The echo command handles our Hello world, and sed replaces our search and replace with less typing. However, Perl surpasses the capabilities of those old friends when you add the -i, in-place editing option.
In-place editing allows Perl to modify a file where it stands. There’s no need to send the output to a newfile, where it will likely just be renamed over the original. To run our old/new substitution on a file named myfile, you just need to use the command:
perl –i –p –e ‘s/old/new/g;’ myfile
The -i option can take an optional parameter indicating that it should create backup files and specifying what their suffix should be. What’s more, instead of listing a single file as an argument, you can provide a list of files for the -e command to operate on. Using those two little tricks, you can construct a command that will run a search and replace on all the files in your home directory and leave backups to boot:
find ~ -type f |xargs perl –i.bak –p –e ‘s/old/new/g;’
Used this way, the find command provides a list of all the files in and below your home directory. The xargs command takes standard input, in this case the list of files, and appends it to the command immediately following it. Omitting the .bak will inhibit the creation of backup files, if you’re in a dangerous sort of mood.
The command provided using the -e option is executed on each line, but if you’d rather have word-level granularity you can add the –a, auto-split option. When used with –p, the -a option causes Perl to break each input line on white space into the array @F as if it were passed through Perl’s split command. This option can be used to easily work with columnar data. A script like:
perl –i –n –a –e ‘print @F[2,4];’ mychart
looks a little complicated, but it’s easy to parse when you examine one piece at a time. The -i option means we’re going to edit the specified file, mychart, in this case, directly. The -n indicates that the command given by the -e should be run for each line, but no output should be printed. The-a is our auto-split mode, and the -e command prints the contents of the third and fifth columns (they’re numbered starting with zero), concatenated together.
Adding one last option, -F, allows us to tune the behavior of the auto-split to break the lines into array elements on any boundary that can be specified using a regular expression. Thus, a command such as this:
perl –i –n –a –F, –e’print join “,”, @F[1..$#F];’ mytable
could be used to throw away the first column of a comma-separated values (CSV) table. Again the -i is in-place edit. The -n option tells Perl to process each line and to print nothing. The -a enables auto-split, which the -F tells to split on commas. The command provided rejoins all the column data from column one through the last column (a range that excluded column zero) with commas and prints the output.
Like most things related to Perl, the syntax can get a bit hairy, but the rewards are great. The more you use command-line Perl, the easier things become. Before too long, you’ll find you’ve forgotten the command-line switches for cut. You’ll be the better for it.
Wanna share a Perl tip?
Have a Perl script or trick that you would like to share? Send it to us or post a comment below.