Developer

Work more efficiently with Perl regular expressions

Don't let Perl regular expressions get the better of you. Here are two tips to make your code a little more readable and workable.


By Charles Galpin

Perl regular expressions (regexes) are often rather cryptic and hard to deal with, but these two tips should help. We'll look first at a technique to make the expressions more readable. Then, we'll explain a way to pick an nth occurrence of an expression.

Using the /x regex modifier
Working with Perl regex can be a chore, especially since regex contents may be difficult to read. The problems are compounded when you have to edit a legacy regex you didn't write. The code often appears more like white noise than coherent Perl. A simple solution to this problem is the /x regex modifier.

The /x modifier prompts the regex to ignore white space (spaces, tabs, carriage returns, etc.), allowing you to use these characters to format your regex code into a reader-friendly arrangement. If you actually need the regex to read a white space character as a part of your code, simply precede the character with a backslash.

The /x modifier has another advantage: It lets you use the pound character (#) to create a comment line within the regex. All content that follows a pound character on a line is considered a comment and is therefore ignored by the regex.

While use of the /x modifier often leads to nominally large regexes, the formatting and comments made possible by /x will make them much more readable for both you and the next programmer to work with your regex code.

Nth occurrence in regex
Some things in Perl seem like they should be easy but prove to be more time-consuming than first appearances would suggest. One case in point: locating a certain regex after you've found several that match within the same string. For example, parsing a line in a log file may return up to 10 sets of numbers. A split() might solve this problem, but a more reliable solution is to use a regex to find a regex:
# hard code a value for the example
$_ = "this is 123 first, the 234 second, the 345 third, and the 456 fourth";

$n=3; # look for the third set of numbers
$count=0;
while(/(\d+)/g){
   if(++$count==$n){
      print "Number $n occurrence was $1\n";
   }
}


When run, this example prints Number 3 occurrence was 345. It's a simple loop that keeps looking for the specified regex; to prompt the loop to look for the third word, you could replace the while statement with:
while(/(\w+)/g) {

Editor's Picks

Free Newsletters, In your Inbox