Developer

Three handy Perl approaches let you grab data for UNIX apps

You can use Perl to eliminate some routine tasks in UNIX, such as grabbing data from external commands. We'll explore three different methods, including backticks and the open() command.


Many UNIX admins and system architects prefer to tackle routine tasks with Shell or Perl scripts, which often involves obtaining data from external commands. Shell scripts work well for simple tasks, but they can get really ugly when more complex logic is required. That’s where Perl excels. I can recommend three approaches to obtaining external input in Perl:
  • Backticks
  • system() and exec()
  • open() and parse

Let's take a look at these three techniques, with an extended example of open() and parse.

Backticks
The simplest method for reading output is using backticks. In UNIX, the backtick (`) around a command tells the shell to execute it. If you have any experience with Shell scripting, you already know how this works. Place the fully qualified path to the executable (including any extra arguments) in backticks, and the Perl interpreter will attempt to execute that command. Perl will assign to a variable whatever the command spits out to STDOUT. For example, to retrieve a string with the current date, your script might include a line that looks like this:
$date=`/usr/bin/date`;

The variable $date now contains a null-terminated string with the current date. The variable $? (just like Shell scripts) contains the return value of the last run command. This is by far the simplest method of grabbing external data, but it works well only with single-line output. Output that contains multiple lines can be parsed using a combination of split() and a foreach loop, as shown in Listing A.

This is somewhat less efficient than our last method. Error-checking also is more difficult using backticks, since all errors ranging from system errors (executable not found) to application-specific errors will be stored in the output variable ($date, in our first example).

system() and exec()
The next approach you can employ relies on the system() or exec() functions. These function calls behave much like the POSIX system(2) and exec(2) calls found on most UNIX platforms. Both functions take an executable name as their first argument and the list of arguments for the executable as the rest of their arguments.

The exec() function returns an error only if it can't find the executable, but ignores any output or the return value from the executable. The system() function will wait for the executable to finish, and then return the exit status of the called executable. Any output sent to STDOUT by the called program is lost. This is a useful call when you are not interested in what the executable sends to STDOUT, but if you need to parse this data, our next method is the one you need.

open() and parse
The most powerful mechanism to execute an external process is to open a pipe to the external source using the built-in Perl function open(). Since open() allows you to read the output just as you would read a file, this method gives you a great deal of flexibility.

Learn by example: Knocking off the parent process
In the example below (you can download the code here), an application forks off many processes, each with the same name. To shut down the application cleanly, the administrator must send the parent process (and only the parent process) the kill signal, and it will take care of the rest. In our example, the processes have the name “mike”.

In theory, our script is quite simple: run /usr/bin/ps and look for the process “mike” that has a parent process of 1. The actual scripting is a bit more complex, however. The first step is to open a pipe to /usr/bin/ps (your exact syntax on /usr/bin/ps may vary):
open PS, “/usr/bin/ps –ef |”;

The open() function takes the name of the file handle (PS) as the first argument and the name of the command as the second. In this case, we want to use the STDOUT of the command as our input, which is why the pipe (|) is at the end of the command, just as it would appear on a UNIX command line.

The next step is to parse through the output, line by line. This is accomplished with a while loop:
while(<PS>) {
      #loop logic here
}

This while loop will loop once for lines terminated by the regular UNIX new line character, otherwise known as \n, and assigns the value of each line in Perl’s special variable $_.

An often forgotten step is to strip off the /n character by using chop() or chomp(). The function chop() simply takes off the last character of the line, without checking if the last character is indeed a new line character. chomp() has some intelligence and will remove only the new line character. In both cases, if you don't pass the function an argument, it will perform on the $_ variable, which in our case is the line in question.

Finally, we can deal with the data in each line. The most obvious method is to use the split() function to split the line into manageable sections. Problem is, each line is not uniform in the number of elements divided by white space, since depending on how long the process has been running, it can look like any of the examples shown in Listing B.

The next code fragment, shown in Listing C, splits up the first five elements of the line and then splits the rest (contents in $rest) depending on the date pattern.

The split() function needs a bit of explanation. It takes the regular expression to split by as the first argument, and the string to split as the second. An optional third argument specifies exactly how many elements to split. split() returns an array of items it has split, which can be taken as an array, such as @tmp=split(), or as individual variables, specified as ($var1, $var2)=split().

The regular expression /\s+/ splits by one or more instances of white space. The pattern in the second line of Listing C looks for any line that starts with two digits between 0 and 9, followed by a semi-colon. (If you need to brush up on regular expression, then check out this article.)

Now that you have the process name in $cmd, it’s a simple matter to determine if this is the correct process and if it's owned by the kernel:
if (($cmd =~ /mike/) && ($ppid eq "1")) {
kill ‘TERM’, $pid;
}


That’s it! Remember, you can download the entire program here.

UNIX administration made easy
The series of functions used here are the basis for most scripts related to UNIX administration. Most scripts I write use a variation on the open(), while(), chomp(), split() theme. I hope this gives you a good idea how to start automating complex tasks. If you want to share a Perl script with the Builder.com community, please contact the editors.

Editor's Picks

Free Newsletters, In your Inbox