This article originally appeared in the Oracle e-newsletter. Click

here to subscribe automatically.

The lack of flexible SQL regular expressions
in Oracle 8 and 9i has
finally been addressed with Oracle 10g. The Oracle database will
now have built-in regular expressions built upon the POSIX
standard.

The four new functions are: REGEXP_LIKE,
REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_REPLACE. They’re similar in
usage to the Oracle SQL functions, LIKE, INSTR, SUBSTR, and
REPLACE, but they use POSIX regular expressions instead of the old
percent (%) and wildcard ( _ ) characters.

POSIX regular expressions consist of the
standard metacharacters:

  • ‘^’ for the beginning of a string
  • ‘$’ for the end of string
  • ‘.’ for any character
  • Ranges of characters, such as ‘[a-z]’, for
    any ASCII lowercase letter, which is equivalent to the character
    class: “[[:lower:]]””
  • ‘?’ allows a preceding character to match
    zero or one time
  • ‘+’ allows a preceding character to match one
    or more times
  • ‘*’ for zero or more times

You can specify an exact range of occurrences
with “{m,n}” to mean “occurs from m to n times”; “{m}” to mean
“exactly m times”; and “{m,}” to mean “at least m times”. You can
also group sets of characters with parentheses and alternate
between groups with “|” (vertical bar or “pipe”). For example, the
string, ‘^([a-z]+|[0-9]+)$’ will match strings consisting entirely
of lowercase letters or digits.

REGEXP_LIKE is similar to the LIKE operator. It
resolves to TRUE if the first argument matches a regular
expression. For example, WHERE REGEXP_LIKE(ENAME,’^J[AO]’,’i’) will
return rows when ENAME starts with JA or JO. The ‘I’ parameter
makes the regular expression case-insensitive. You can also specify
REGEXP_LIKE in CHECK constraints and function indexes. For
example:

ALTER TABLE EMP ADD CONSTRAINT REGEX01
    CHECK
(REGEXP_LIKE(ENAME,’^[[:alpha:]]+$’));

This will constrain the ENAME column to only
contain alphanumeric characters (i.e., no spaces or punctuation are
allowed). Attempts to insert or update such data will result in an
ORA-2290, or check constraint violated.

REGEXP_INSTR is similar to the INSTR function.
It returns the starting position of the first substring within a
string that matches a regular expression. For example:

SELECT REGEXP_INSTR(‘The total is $400 for your
purchase.’,’$[[:digit:]]+’)
FROM DUAL;

This query will return 14, the starting
position of $400 within the string. You may also specify the number
of the occurrence; the starting position to begin searching from;
and whether to return the position of the match or the position of
the character following the match.

REGEXP_SUBSTR returns the substring that
matches a regular expression. It’s easier than using SUBSTR with
REGEXP_INSTR and LENGTH to select the substring directly.

SELECT REGEXP_INSTR(‘one,two,three’,'[^,]*’) FROM
DUAL;

This query will return ‘one’, treating the
first argument as a comma-delimited list and returning everything
up to the first comma.

REGEXP_REPLACE will return the initial
argument, replacing any matching substrings. For example:

SELECT REGEXP_REPLACE(‘The temperature is
23°F’,
    ‘([[:digit:]])+°F’,
    (‘\1′-32)*5/9||’°C’)
FROM DUAL;

This query will look for a pattern for
Fahrenheit temperature and convert it to Celsius. It should return:
‘The temperature is -5°C’.

For more information, check out O’Reilly’s Oracle
Regular Expressions Pocket Reference
by Jonathan Gennick
and Peter Linsley.

Scott Stephens worked for Oracle for more than 13 years in technical support, e-commerce, marketing, and

software development. For more of his Oracle tips visit our Oracle Dev Tips

Library.