Developer

Simple filters in Perl, Ruby, and Bourne shell

A filter is a type of program that takes data input, operates on it, and produces modified output, and it's one of the most useful types of admin scripts. Filters are also easy to write, especially in languages such as the Bourne shell, Perl, and Ruby.

In Eric Raymond's The Art of Unix Programming, he referred to the usefulness of a type of utility called a "filter":

Many programs can be written as filters, which read sequentially from standard input and write only to standard output.

An example provided in the book is of wc, a program that counts characters (or bytes), "words", and lines in its input and produces the numbers counted as output. For instance, checking the contents of the lib subdirectory for the chroot program files could produce this output:

~/tmp/chroot/lib> ls

libc.so.7 libedit.so.7 libncurses.so.8

You could pipe the output of ls to wc to get the number of lines, words, and characters:

~/tmp/chroot/lib> ls | wc

3 3 39

Writing your own filter scripts is incredibly easy in languages such as Perl, Ruby, and the Bourne shell.

Perl script

Perl's standard filter idom is quite simple and clean. Some people claim that Perl is unreadable code, but they have probably never read well-written Perl.

#!/usr/bin/env perl

while (<>) {

# code here to alter the contents of $_

print $_;

}

To operate on the contents of a file named file.txt:

~> script.pl file.txt

You can also use pipes to direct the output of another program to the script as a text stream:

~> ls | script.pl

Finally, you can call the script without piping any text stream or naming any file as a command line argument:

~> script.pl

If you do so, it will listen on standard input so that you can manually specify one line of input at a time. Telling it you are done is as easy as holding down [Ctrl] and pressing [D], which sends it the end-of-file (EOF) character.

If you want to do something other than alter the contents of Perl's implicit scalar variable $_, you could print some other output instead. The $_ variable contains one line of input at a time, which can be used in whatever operations you wish to perform before producing a line of output. Of course, output does not need to be produced within the while loop either if you do not want to. For instance, to roughly duplicate the standard behavior of wc is easy enough:

#!/usr/bin/env perl

my @output = (0,0,0);

while (<>) {

$output[0]++;

$output[1] += split;

$output[2] += length;

}

printf "%8d%8d%8d\n", @output;

Unlike wc, this does not list counts for several files specified as command line arguments separately, nor list the names of the files in the output. Instead, it simply adds up the totals for all of them at once. This simplistic script does not offer any of wc's command line options, either, but it serves to illustrate how a filter can be constructed.

The other examples will only cover the basic filter input handling idiom itself, and leave the implementation of wc-like behavior as an exercise for the reader.

Ruby script

Ruby does not have a single idiom that is obviously the "standard" way to do it. There are at least two options that work quite well. The first uses a Ruby iteratory method, for typically Rubyish style:

#!/usr/bin/env ruby

$<.each do |line|

# code here to alter the contents of line

print line

end

The second uses a while loop, but does not use the kind of "weird" symbol-based variable that some programmers remember only with distaste from Perl:

while line = gets

# code here to alter the contents of line

print line

end

Operating on the contents of a file, taking input interactively, or accepting a text stream as input works the same as for the equivalent Perl script.

Shell script

This is the least powerful filter idiom presented here because the Bourne shell does not provide the same succinct facilities for input handling as Perl and Ruby:

#!/bin/sh

while read data; do

# code here to alter the contents of $data

echo $data

done

To operate on the contents of a file named file.txt, you have to use a redirect, because feeding the script a filename as a command line argument simply results in an error. Calling the script with a redirect is still simple enough, though:

~> script.sh < file.txt

The redirect character < is used to direct the contents of file.txt to the script.sh process as a text stream. You can also use pipes to direct the output of another program to the script as a text stream, as with the other examples:

~> ls | script.sh

While the behavior you see with the Perl and Ruby examples can be duplicated using the Bourne shell, it requires a bit more code to do so, using a conditional statement to deal with cases where the filename is provided as a command line argument without the redirect as well as where a text stream is directed to the program by some other means. It hardly seems worth the effort to avoid using a redirect.

Go forth and code

In my TechRepublic article Seven ideas for learning how to program, I suggested that writing Unix admin scripts could serve as a great way for new programmers to practice the craft of coding. Filters are among the most useful command line utilities in a Unix environment, and as demonstrated here, they can be surprisingly easy to write with a minimum of programming skill.

Regardless of your programming experience, these simple filter script idioms in three common sysadmin scripting languages can help any Unix sysadmin do his or her job better.

About

Chad Perrin is an IT consultant, developer, and freelance professional writer. He holds both Microsoft and CompTIA certifications and is a graduate of two IT industry trade schools.

Editor's Picks