Developer

Quick tips for Perl programmers

Who wouldn't want a little free code in their inbox--as long as it's not another one of those script viruses. Check out this compilation of Perl tips that could save you time down the road.


Here are a few quick Perl tips that could save you time down the road.

Negating regex assignments
The regular expression assignment operator is standard use within Perl scripts, such as:
 
if ($debug =~ /true/i)
 

This syntax checks to see if the $debug variable is set to some form of "true." But what if you want to test against it? Instead of using the sloppy If statement:
 
if ($debug =~ /true/) { # debug is true } else {
# debug is false
}
 

we can simply negate the If statement. While this would seem simple enough, the actual syntax is not immediately obvious. The normal negation syntax for the If statement is:
 
if ($x != $y) { …. }
 

but a regex expression doesn’t use the != symbol. Instead, we can negate the regex test by using !~:
 
if ($debug !~ /true/I) { # the debug variable doesn’t contain ‘true’
# run this
}
 

Be sure to comment this piece of code so an unsuspecting reader will know what this means. This is just a precautionary measure in case someone else uses your code.

Don't get greedy
Perl rookies may grasp the power of regular expressions, but they may have a difficult time understanding why a match doesn’t work like they think it will, particularly when using the ".*" operator.

Explain to these users that because of optimizations in the string parsing, a regular expression works the string from right to left. Also, because of the type of string searching that takes place, the regular expression engine is known as a "greedy" algorithm. This is because these types of expressions take the fastest—yet possibly not the smartest—route to figuring out the problem. This could lead to logical errors.

For example, if you wanted to strip the text from an HTML file, this would not work:
 
$htmlfiletext =~ s/<.*>//gs;
 

It would simply strip everything between the leading <html> and </html>, leaving nothing. (Check your references to see what the "g" and "s" options mean.)

Since Perl 5.006 or so, a new search has been added that isn't greedy. The usage is the same, except now ".*?" can be used in the following format:
 
$htmlfiletext =~ s/<.*?>//gs;
 

Now the line will run as expected, and everything up until the next character listed (which is ">" in this case) will be captured.

This new format makes problems that seemed difficult in the past appear to be a snap!

Dereferencing variables
Referenced variables are no more than variables that have a pointer to them. Although Perl doesn't explicitly allow pointers, a pointer is used when a variable is referenced. For example, suppose you have a complex data structure that is composed of an array of references to other arrays, possibly representing the data in a table.

The first row returned would be @data[0], and the second row would be @data[1]. But the data returned from these positions would be referenced arrays, which means you must "dereference" the variable to use it explicitly. In order to get the array of the second row, we would need to do this:
 
@second row = @{$data[1]};
 

This tells Perl that you're getting the reference from @data, position 1, dereferencing it into an array, and assigning it to another array.

Suppose you wanted to get the length of that row directly. You could do this:
 
$rowlength = $#{$data[1]};
 

In this case, we dereference the array pointer from data and use the standard array length characters $#. Learning how to dereference is key when working with complex data structures.

Assigning and reassigning
Let's imagine that you asked a user for the title of a book, and his or her response was stored as $title. What if you also want to store that title in an XML file? To make sure the title doesn’t have any spaces in it when it’s stored, you create another variable called $title_nospaces.

The next step is to get the input from the user and convert it with a separate variable:
 
$title = $_;—# get the title from this loop or whatever
$title_nospaces = $title;—# create a new variable from title
$title_nospaces =~ s/ /_/g;—# remove the spaces
 

While this method is simple and effective, it requires a little more typing than is necessary. Why not incorporate the whole regular expression assignment operation into one line?
 
($title_nospaces = $title) =~ s/ /_/g;
 

This assigns $title_nospaces to $title (creating a new variable, if necessary) and then applies the regex to it. This line doesn't have an effect on $title.

Although this syntax doesn't provide any functional differences, this shortcut will save you time and show that you're a good Perl programmer.

Export array
You’ve created a library of functions that work with company files or HTML files, or maybe they’re just functions you like to use. So why are you still copying those functions into scripts each time a new script is created? Probably because a Perl module in the form of a .pm file was used. These Perl modules are just libraries of functions that make programming certain things easier.

But how exactly do you create your own? Copy all your functions into a file, save it as a .pm file (such as yourlibrary.pm), and you're done. Well, almost.

The script using your library needs to be aware of the functions that the library makes available. The @EXPORT array, a specific array designed for this, contains a list of the available functions. Here is the typical usage for this array:
 
use Exporter();
@ISA = qw(Exporter);
 
@EXPORT = qw(
function1
function2

);
 

Load the export array with the necessary functions, and they're instantly available to your script. Just be sure the library is either in the same directory as the script or in one of the Perl library paths.

Avoiding instantiation of function variables
The @_ array is used to hold the parameters passed into a function. Here’s an example function that reads two variables from the parameter array and sets the function’s result:
 
sub add {
my ($a, b) = @_;
my $c = $a + $b;
return $c;
}
 

This code assigns the local variables $a and $b to the first and second parameters passed into add. Although this is a simple example, consider what occurs behind the scenes. When the function is called, the parameter array is created, and the $a and $b variables are created and initialized with the parameters; $c is then created only to hold a temporary value. If carried out repeatedly, this creation and freeing of temporary variables is inefficient.

Take a look at this revised version:
 
sub add {
return @_[0] + @_[1];
}
 

This result is much cleaner in relation to memory and speed. (Again, you're encouraged to use the built-in + feature.) Avoid instantiating variables in subroutines unless the function is long and readability becomes a concern.

Editor's Picks