Data Management

Oracle Tip: Understand Oracle 10g SQL Regular Expression support

With the release of Oracle 10g, an Oracle database will now have built-in regular expressions built upon the POSIX standard.

This article originally appeared in the Oracle e-newsletter. Click here to subscribe automatically.

The lack of flexible SQL regular expressions in Oracle 8 and 9i has finally been addressed with Oracle 10g. The Oracle database will now have built-in regular expressions built upon the POSIX standard.

The four new functions are: REGEXP_LIKE, REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_REPLACE. They're similar in usage to the Oracle SQL functions, LIKE, INSTR, SUBSTR, and REPLACE, but they use POSIX regular expressions instead of the old percent (%) and wildcard ( _ ) characters.

POSIX regular expressions consist of the standard metacharacters:

  • '^' for the beginning of a string
  • '$' for the end of string
  • '.' for any character
  • Ranges of characters, such as '[a-z]', for any ASCII lowercase letter, which is equivalent to the character class: "[[:lower:]]""
  • '?' allows a preceding character to match zero or one time
  • '+' allows a preceding character to match one or more times
  • '*' for zero or more times

You can specify an exact range of occurrences with "{m,n}" to mean "occurs from m to n times"; "{m}" to mean "exactly m times"; and "{m,}" to mean "at least m times". You can also group sets of characters with parentheses and alternate between groups with "|" (vertical bar or "pipe"). For example, the string, '^([a-z]+|[0-9]+)$' will match strings consisting entirely of lowercase letters or digits.

REGEXP_LIKE is similar to the LIKE operator. It resolves to TRUE if the first argument matches a regular expression. For example, WHERE REGEXP_LIKE(ENAME,'^J[AO]','i') will return rows when ENAME starts with JA or JO. The 'I' parameter makes the regular expression case-insensitive. You can also specify REGEXP_LIKE in CHECK constraints and function indexes. For example:

ALTER TABLE EMP ADD CONSTRAINT REGEX01
    CHECK (REGEXP_LIKE(ENAME,'^[[:alpha:]]+$'));

This will constrain the ENAME column to only contain alphanumeric characters (i.e., no spaces or punctuation are allowed). Attempts to insert or update such data will result in an ORA-2290, or check constraint violated.

REGEXP_INSTR is similar to the INSTR function. It returns the starting position of the first substring within a string that matches a regular expression. For example:

SELECT REGEXP_INSTR('The total is $400 for your purchase.','$[[:digit:]]+')
FROM DUAL;

This query will return 14, the starting position of $400 within the string. You may also specify the number of the occurrence; the starting position to begin searching from; and whether to return the position of the match or the position of the character following the match.

REGEXP_SUBSTR returns the substring that matches a regular expression. It's easier than using SUBSTR with REGEXP_INSTR and LENGTH to select the substring directly.

SELECT REGEXP_INSTR('one,two,three','[^,]*') FROM DUAL;

This query will return 'one', treating the first argument as a comma-delimited list and returning everything up to the first comma.

REGEXP_REPLACE will return the initial argument, replacing any matching substrings. For example:

SELECT REGEXP_REPLACE('The temperature is 23°F',
    '([[:digit:]])+°F',
    ('\1'-32)*5/9||'°C')
FROM DUAL;

This query will look for a pattern for Fahrenheit temperature and convert it to Celsius. It should return: 'The temperature is -5°C'.

For more information, check out O'Reilly's Oracle Regular Expressions Pocket Reference by Jonathan Gennick and Peter Linsley.

Scott Stephens worked for Oracle for more than 13 years in technical support, e-commerce, marketing, and software development. For more of his Oracle tips visit our Oracle Dev Tips Library.

Editor's Picks

Free Newsletters, In your Inbox