id="info"

Open Source

Regular expressions for everyone: The basics

Vincent Danen goes over some basic regular expressions. They are handy for developers and programmers, of course, and can even be employed for Google searching.

As a Linux user or administrator, the topic of regular expressions probably comes up fairly often, and if it doesn't, you're missing out. A lot of command-line programs like grep or scripts in perl, python, or PHP scripts, make use of, or can make use of, regular expressions.

So what is a regular expression?

A regular expression (also known as a regex or regexp) is a way to match strings of text, by characters, words, or patterns of characters. There are three primary types of regular expressions: POSIX regexps, perl-based regexps, and simple regexps. The basics amongst them, however, are largely the same; also, perl-based regexps are used in a number of programming languages besides perl: it is also used by (or slightly derived in) Python, Ruby, Java, JavaScript, and PCRE, to name a few.

Here is a list of basic perl-based regular expressions and what they do:

  • . Matches any characters
  • * Matches 0 or more of the preceding character
  • + Matches 1 or more of the preceding characters
  • ? Matches 0 or 1 occurrences of the preceding character (the preceding character is optional)
  • \d Matches a single digit ('[:digit:]' in POSIX)
  • \w Matches any word character (including alphanumeric and underscore; '[:word:]' in POSIX)
  • [ABC] Matches any single character from the class (i.e. 'A' or 'B' or 'C')
  • [ABC]+ Matches 1 or more characters from the class
  • $ Matches the end of the string
  • ^ Matches the beginning of the string
  • | Matches on the expression either before or after '|'

It's not pretty, and as a result, regular expressions can become very messy looking and difficult to grasp. However, they are very powerful. Here are some examples of regular expressions in actions:

foo|bar

The above matches either "foo" or "bar".

https?://(www.)?foo.com

The above matches either https://www.foo.com, https://foo.com, http://foo.com, or http://www.foo.com.

 [fb]?oo

The above matches "foo", "boo", or "oo".

 [fb]+oo

The above matches "foo", "boo", "fboo", "ffoo", and so on, but not "oo".

Knowing regular expressions is, obviously, most useful when programming, however it can be very useful for command-line tools as well. Grep, when called as egrep can use POSIX regular expressions which elevates grep to a whole new level of convenience. The find command also supports using regular expressions to find files, and likewise the awk and sed tools support regular expressions.

For system administrators, many programs can use regular expressions in configuration files, such as Apache. The "*Match" directives in Apache (i.e., <DirectoryMatch> and <RedirectMatch>) support regular expressions, as do rewrite rules.

Knowing regular expressions isn't for the "elite" or even just for sysadmins; even Google supports regular expressions in search queries! No, they are useful for many people, even if you just learn basic regular expressions such as those noted above. They make searching for things so much easier, and can reduce multiple commands or directives down to a single one, which can ultimately enhance productivity.

About

Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.

Editor's Picks