Developer

Oracle Tip: Create linguistic helper functions in SQL

You can generalize simple tasks in a globalized environment by breaking down a particular language into a set of grammar rules and exceptions to those rules. Scott Stephens demonstrates how you can you do just that in SQL.

In a globalized environment, many simple tasks become more difficult because of the variety of grammar rules in different languages. You can generalize these tasks by breaking down a particular language into a set of grammar rules and exceptions to those rules (as well as a base vocabulary). In some languages (e.g., Perl and Java), there are public domain modules that do linguistic transformations on text.

For a somewhat simple example, let's take the trick of converting a number into its spelled-out version (e.g., for writing checks and legal contracts). This trick has been around since the early days of Oracle and generally goes like this:

select to_char(to_date(12345,'J'),'Jsp') from dual;

Twelve Thousand Three Hundred Forty-Five

The TO_DATE function converts the number into a date using the Julian date format. Then, the TO_CHAR takes the date and formats it as a string of spelled-out numbers for the Julian date again. This trick has some limitations.

First, Julian dates are only valid in Oracle until the year 9999, so the largest possible value is 5373484. The lower end of the range is 1 or 4712BC. Also, since there was no year "zero," it isn't possible to generate the text "zero" without an extra DECODE or CASE statement. The third big limitation is that it ignores your NLS settings. No matter which language you use, the numbers are always spelled out in American English. The same problem exists for simple operations such as spelling out the day. For instance, try to generate the phrase "Cinco de Mayo" in Spanish:

alter session set nls_language = 'SPANISH';
select to_char(to_date('0505','MMDD'),'Ddspth Month') from dual;

Fifth Mayo

The grammar involved in generating numbers for most languages is actually fairly simple. Most of the work involves gathering all the different grammar rules and building up enough rules to generate the correct grammar patterns. (I'll avoid the problems involved in trying to match number and gender for now.)

First, I'll create two tables: one to hold the base words and exceptions, and a second to hold some simple template patterns for generating text. If a number exists in the first table, then my linguistic function will return that text. For every other number, I'll try to match it against a series of patterns and apply a template to generate the correct text.

create table numwords
(
    lang    varchar2(2),
    num     integer,
    word    varchar2(30),
    constraint numwords_pk primary key (lang,num)
);

create table numrules
(
    lang    varchar2(2),
    seq     integer,
    p1      integer,
    p2      integer,
    temp0   varchar2(30),
    temp    varchar2(30),
    constraint numrules_pk primary key (lang,seq)
);

Here is the code needed to generate the spelled-out version of a number. I'll stick to cardinal numbers (such as one, two, and three); however, these functions can be used to generate ordinals (first, second, third) and plural versions by listing more exceptions and patterns for each language:

REM — create a table of base words and exceptions
create or replace package genword
as
    function get_word(n number) return varchar2;
    function cardinal(n number) return varchar2;
end genword;
/
show errors;

create or replace package body genword
as
    function get_word(n number) return varchar2
    is
        l_word numwords.word%type;
    begin
        select word into l_word from numwords
         where lang = sys_context('userenv','lang') and num = n;
        return l_word;
    exception
        when no_data_found then
            return null;
    end;
    —
    function cardinal(n number) return varchar2
    is
        p number;       — power
        t varchar2(30); — template
        v number;       — lower portion
        l_word      numwords.word%type;
    begin
        if n < 0 then
            l_word := get_word(-1);
            if l_word is null then
                return null;
            end if;
            return l_word||' '||cardinal(-n);
        end if;
        l_word  := get_word(n);
        if l_word is not null then
            return l_word;
        end if;
        for row in
        (
            select * from numrules
             where lang = sys_context('userenv','lang')
             order by seq
        )
        loop
            if length(n) <= row.p1 + row.p2 then
                p := power(10,row.p2);
                v := mod(n,p);
                if row.seq = 0 then
                    if n < 20 then
                        return replace(row.temp0,'~2',cardinal(v));
                    end if;
                else
                    if v = 0 then
                        return replace(row.temp0,'~1',cardinal(n/p));
                    else
                        return replace(replace(nvl(row.temp,'~1 ~2'),
                            '~1',cardinal(n-v)),
                            '~2',cardinal(v));
                    end if;
                end if;
            end if;
        end loop;
        return 'NUMBER TOO LARGE';
    end cardinal;
end genword;
/
show errors;

Finally, here is some data I gathered for English and German. I also copy the data from American English to British English and use the terms "thousand million" and "million million" instead of "billion" and "trillion" (in the American sense), which are often a source of confusion outside the United States. This is enough data to generate spelled-out versions of whole integers between -999,999,999,999 and 999,999,999,999, including zero.

REM — American English
insert into numwords values ('US',-1,'negative');
insert into numwords values ('US',0,'zero');
insert into numwords values ('US',1,'one');
insert into numwords values ('US',2,'two');
insert into numwords values ('US',3,'three');
insert into numwords values ('US',4,'four');
insert into numwords values ('US',5,'five');
insert into numwords values ('US',6,'six');
insert into numwords values ('US',7,'seven');
insert into numwords values ('US',8,'eight');
insert into numwords values ('US',9,'nine');
insert into numwords values ('US',10,'ten');
insert into numwords values ('US',11,'eleven');
insert into numwords values ('US',12,'twelve');
insert into numwords values ('US',13,'thirteen');
insert into numwords values ('US',15,'fifteen');
insert into numwords values ('US',18,'eighteen');
insert into numwords values ('US',20,'twenty');
insert into numwords values ('US',30,'thirty');
insert into numwords values ('US',40,'forty');
insert into numwords values ('US',50,'fifty');
insert into numwords values ('US',80,'eighty');
insert into numwords select 'GB',num,word from numwords where lang = 'US';

insert into numrules values ('US',0,1,1,'~2teen',null);
insert into numrules values ('US',1,1,1,'~1ty','~1-~2');
insert into numrules values ('US',2,1,2,'~1 hundred',null);
insert into numrules values ('US',3,3,3,'~1 thousand',null);
insert into numrules values ('US',4,3,6,'~1 million',null);
insert into numrules select 'GB',seq,p1,p2,temp0,temp
    from numrules where lang = 'US';
insert into numrules values ('US',5,3,9,'~1 billion',null);
insert into numrules values ('GB',5,3,9,'~1 thousand million',null);
insert into numrules values ('US',6,3,12,'~1 trillion',null);
insert into numrules values ('GB',6,3,12,'~1 million million',null);

REM - German
insert into numwords values ('D',-1,'negativ');
insert into numwords values ('D',0,'null');
insert into numwords values ('D',1,'eins');
insert into numwords values ('D',2,'zwei');
insert into numwords values ('D',3,'drei');
insert into numwords values ('D',4,'vier');
insert into numwords values ('D',5,unistr('f\00FCnf'));
insert into numwords values ('D',6,'sechs');
insert into numwords values ('D',7,'sieben');
insert into numwords values ('D',8,'acht');
insert into numwords values ('D',9,'neun');
insert into numwords values ('D',10,'zehn');
insert into numwords values ('D',11,'elf');
insert into numwords values ('D',12,unistr('zw\00F6lf'));
insert into numwords values ('D',13,'dreizehn');
insert into numwords values ('D',16,'sechzehn');
insert into numwords values ('D',17,'siebzehn');
insert into numwords values ('D',20,'zwanzig');
insert into numwords values ('D',21,'einundzwanzig');
insert into numwords values ('D',30,unistr('drei\00DFig'));
insert into numwords values ('D',31,unistr('einunddrei\00DFig'));
insert into numwords values ('D',41,'einundvierzig');
insert into numwords values ('D',51,unistr('einundf\00FCnfzig'));
insert into numwords values ('D',60,'sechzig');
insert into numwords values ('D',70,'siebzig');
insert into numwords values ('D',100,'hundert');
insert into numwords values ('D',1000,'tausend');
insert into numwords values ('D',1e6,'eine Million');
insert into numwords values ('D',1e9,'eine Milliarde');
insert into numwords values ('D',1e12,'eine Billion');

insert into numrules values ('D',0,1,1,'~2zehn',null);
insert into numrules values ('D',1,1,1,'~1zig','~2und~1');
insert into numrules values ('D',2,1,2,'~1hundert','~1~2');
insert into numrules values ('D',3,3,3,'~1tausend','~1 und ~2');
insert into numrules values ('D',4,3,6,'~1 Millionen',null);
insert into numrules values ('D',5,3,9,'~1 Milliarden',null);
insert into numrules values ('D',6,3,12,'~1 Billionen',null);

Here are some simple SQL statements that use the functions and the above data. You can try setting the language to 'GERMAN', or 'ENGLISH' to try out the other two sets of data:

SQL> alter session set nls_language = 'AMERICAN';
SQL> select genword.cardinal(123456789) from dual;

one hundred twenty-three million four hundred fifty-six thousand seven hundred
 eighty-nine

TechRepublic's Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!

Editor's Picks

Free Newsletters, In your Inbox