Software Development

Oracle Tip: Create functions to join and split strings in SQL

Learn how to take a comma delimited list of values in a single string and use it as a table of values.
A common task when selecting data from a database is to take a set of values a query returns and format it as a comma delimited list. Another task that's almost as common is the need to do the reverse: Take a comma delimited list of values in a single string and use it as a table of values.

Many scripting languages, such as Perl and Python, provide functions that do this with their own language-specific list of values; so it's surprising that, as of yet, this functionality isn't a standard part of SQL functions. I've seen some pretty ugly looking code that involved complex declarations with MAX and DECODE, but that solution usually only returns a limited set of values. With some of the new Oracle9i and above features, it's possible to do this yourself.

I'd like to use a "join" functionality to specify a query that returns a single column and a delimiter, and then receive a simple string that contains a list of those values separated by my delimiter. The query part can be passed to the function as a REF CURSOR using the new SQL CURSOR function. The delimiter should default to a comma, since that is the most commonly used delimiter. So, the syntax should be:

SQL> select join(cursor(select ename from emp)) from dual;

SMITH,ALLEN,WARD,JONES,MARTIN,BLAKE,CLARK,SCOTT,KING,TURNER,ADAMS,
JAMES,FORD,MILLER

The following code will perform this function:

create or replace function join
(
    p_cursor sys_refcursor,
    p_del varchar2 := ','
) return varchar2
is
    l_value   varchar2(32767);
    l_result  varchar2(32767);
begin
    loop
        fetch p_cursor into l_value;
        exit when p_cursor%notfound;
        if l_result is not null then
            l_result := l_result || p_del;
        end if;
        l_result := l_result || l_value;
    end loop;
    return l_result;
end join;
/
show errors;

The PL/SQL User's Guide says you always have to declare a package that defines a ref cursor; however, the database already defines this as SYS_REFCURSOR in the STANDARD package. The PL/SQL code should be fairly straightforward. There is a limit of 32,767 characters on the output string and the input column.

Since all datatypes can be automatically converted to character strings, you can use any datatype in the cursor--as long as it's one column. For example:

SQL> select join(cursor(select trunc(hiredate,'month') from emp),'|') from
dual;

01-DEC-80|01-FEB-81|01-FEB-81|01-APR-81|01-SEP-81|01-MAY-81|01-JUN-81|01-APR-87|01-NOV-81|01-SEP-81|01-MAY-87|01-DEC-81|
01-DEC-81|01-JAN-82

There's another extra benefit. Since the cursor is part of the SQL statement, you can easily join the query inside the join with the outer query. Here is a query that returns each table and a list of the columns that make up its primary key:

SQL> select table_name,join(cursor(select column_name from user_cons_columns
                 where constraint_name = user_constraints.constraint_name
                 order by position)) columns
       from user_constraints where constraint_type = 'P';
 

View the output in Table A.

You can also use this "join" function to compare two sets of ordered data. For example, the following query will check that an index has been created on a foreign key (which helps prevent locking the table and aids master-detail queries):

column status format a7
column table_name format a30
column columns format a40 word_wrapped

select decode(indexes.table_name,null,'missing','ok') status,
       constraints.table_name,
       constraints.columns
  from
    (select table_name,
            constraint_name,
            join(cursor
            (
                select column_name
                  from user_cons_columns
                 where constraint_name = user_constraints.constraint_name
            )) columns
        from user_constraints
       where constraint_type = 'R'
       ) constraints,
    (select table_name, index_name,
            join(cursor
            (
                select column_name
                  from user_ind_columns
                 where index_name = user_indexes.index_name
            )) columns
      from user_indexes) indexes
    where constraints.table_name = indexes.table_name (+)
      and constraints.columns = indexes.columns (+);

This query works by executing two subqueries: one that queries foreign keys and another that queries indexes. The join between these two queries is on the table name and the list of columns used in creating the foreign key and the index, taken as an ordered list of values.

We'd also like the reverse functionality: to have the ability to take a single comma-delimited value and treat it as if it were a column in a table. We can take advantage of the TABLE SQL function and PL/SQL function tables to do this quite easily, but first, we must define the result type to be a TABLE type of the largest possible string.

create or replace type split_tbl as table of varchar2(32767);
/
show errors;

create or replace function split
(
    p_list varchar2,
    p_del varchar2 := ','
) return split_tbl pipelined
is
    l_idx    pls_integer;
    l_list    varchar2(32767) := p_list;
AA
    l_value    varchar2(32767);
begin
    loop
        l_idx := instr(l_list,p_del);
        if l_idx > 0 then
            pipe row(substr(l_list,1,l_idx-1));
            l_list := substr(l_list,l_idx+length(p_del));

        else
            pipe row(l_list);
            exit;
        end if;
    end loop;
    return;
end split;
/
show errors;

With this function, I can run a query like this:

SQL> select * from table(split('one,two,three'));

one
two
three

The PL/SQL procedure will parse its argument and return each part through a PIPELINE; the TABLE function allows it to be used in the FROM statement, so it appears to SQL as if it is a table with one column and three rows. (Remember that the column being returned is named COLUMN_VALUE if you want to use the value elsewhere.)

Here's an example query, which shows a dynamic IN condition in a query. The split function generates a table of values, which can be used on a row-by-row basis.

SQL> select ename from emp
      where to_char(hiredate,'YY')
         in (select column_value from table(split('81,82')));

View the output in Table B.

If you want, you can join a column and then split it, too:

SQL> select * from table(split(join(cursor(select ename from emp))));

And, you can use this method to merge sets of values:

create table t(a varchar2(200));
insert into t values('81,82');
insert into t values('84,85');

SQL> select * from table(split(join(cursor(select a from t))));

81
82
84
85

These are just simple example functions. You could extend join to enclose values in quotes and escape quotes inside the values. You could extend split to allow a REF CURSOR parameter instead of a single VARCHAR2, so it could split up sets of columns as well.

TechRepublic's Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!

2 comments
awiersba
awiersba

Using the JOIN function against a table with a thousand rows can result in "ORA-01000: maximum open cursors exceeded" (at least on Oracle database 11gR2).


To avoid that, add a line to CLOSE the cursor p_cursor between the END LOOP and the RETURN statement:

============

    end loop;

  close  p_cursor;
    return l_result;

============


Hope that helps.

vburykh
vburykh

this one 10% faster then with variable list


 FUNCTION SPLIT (
        P_STR   IN      VARCHAR2,
        P_SEP   IN      VARCHAR2 DEFAULT ','
) RETURN varchar_tbl_t
IS
        arr varchar_tbl_t := new varchar_tbl_t();
        i pls_integer := 1;
        j pls_integer := 0;
        sep_len pls_integer := LENGTH(P_SEP);
BEGIN
        LOOP
                j := INSTR(P_STR, P_SEP, i);
                IF j > 0 THEN
                        arr.extend(1);
                        arr(arr.last) := SUBSTR(P_STR,i,j-i);
                        i := j+sep_len;
                ELSE
                        arr.extend(1);
                        arr(arr.last) := SUBSTR(P_STR,i);
                        RETURN arr;
                END IF;
        END LOOP;
END SPLIT;