In a globalized environment, many simple
tasks become more difficult because of the variety of grammar rules
in different languages. You can generalize these tasks by breaking
down a particular language into a set of grammar rules and
exceptions to those rules (as well as a base vocabulary). In some
languages (e.g., Perl and Java), there are public domain modules
that do linguistic transformations on text.

For a somewhat simple example, let’s take the
trick of converting a number into its spelled-out version (e.g.,
for writing checks and legal contracts). This trick has been around
since the early days of Oracle and generally goes like this:

select to_char(to_date(12345,’J’),’Jsp’) from
dual;

Twelve Thousand Three Hundred Forty-Five

The TO_DATE function converts the number into a
date using the Julian date format. Then, the TO_CHAR takes the date
and formats it as a string of spelled-out numbers for the Julian
date again. This trick has some limitations.

First, Julian dates are only valid in Oracle
until the year 9999, so the largest possible value is 5373484. The
lower end of the range is 1 or 4712BC. Also, since there was no
year “zero,” it isn’t possible to generate the text “zero” without
an extra DECODE or CASE statement. The third big limitation is that
it ignores your NLS settings. No matter which language you use, the
numbers are always spelled out in American English. The same
problem exists for simple operations such as spelling out the day.
For instance, try to generate the phrase “Cinco de Mayo” in
Spanish:

alter session set nls_language = ‘SPANISH’;
select to_char(to_date(‘0505′,’MMDD’),’Ddspth Month’) from
dual;

Fifth Mayo

The grammar involved in generating numbers for
most languages is actually fairly simple. Most of the work involves
gathering all the different grammar rules and building up enough
rules to generate the correct grammar patterns. (I’ll avoid the
problems involved in trying to match number and gender for
now.)


First, I’ll create two tables: one to hold the base words and
exceptions, and a second to hold some simple template patterns for
generating text. If a number exists in the first table, then my
linguistic function will return that text. For every other number,
I’ll try to match it against a series of patterns and apply a
template to generate the correct text.

create table numwords
(
    lang    varchar2(2),

    num     integer,

    word    varchar2(30),

    constraint numwords_pk primary key
(lang,num)
);

create table numrules
(
    lang    varchar2(2),

    seq     integer,

    p1      integer,

    p2      integer,

    temp0   varchar2(30),
    temp    varchar2(30),

    constraint numrules_pk primary key
(lang,seq)
);

Here is the code needed to generate the
spelled-out version of a number. I’ll stick to cardinal numbers
(such as one, two, and three); however, these functions can be used
to generate ordinals (first, second, third) and plural versions by
listing more exceptions and patterns for each language:

REM — create a table of base words and
exceptions
create or replace package genword
as
    function get_word(n number) return
varchar2;
    function cardinal(n number) return
varchar2;
end genword;
/
show errors;

create or replace package body genword
as
    function get_word(n number) return
varchar2
    is
        l_word
numwords.word%type;
    begin
        select word into
l_word from numwords
         where lang =
sys_context(‘userenv’,’lang’) and num = n;
        return
l_word;
    exception
        when no_data_found
then
            return
null;
    end;
    —
    function cardinal(n number) return
varchar2
    is
        p
number;       — power
        t varchar2(30); —
template
        v
number;       — lower
portion
        l_word      numwords.word%type;

    begin
        if n < 0
then
            l_word
:= get_word(-1);
            if
l_word is null then
                return
null;
            end
if;
            return
l_word||’ ‘||cardinal(-n);
        end if;
        l_word  :=
get_word(n);
        if l_word is not
null then
            return
l_word;
        end if;
        for row in
        (
            select
* from numrules
             where
lang = sys_context(‘userenv’,’lang’)
             order
by seq
        )
        loop
            if
length(n) <= row.p1 + row.p2 then
                p
:= power(10,row.p2);
                v
:= mod(n,p);
                if
row.seq = 0 then
                    if
n < 20 then
                        return
replace(row.temp0,’~2′,cardinal(v));
                    end
if;
                else

                    if
v = 0 then
                        return
replace(row.temp0,’~1′,cardinal(n/p));
                    else

                        return
replace(replace(nvl(row.temp,’~1 ~2′),
                           
‘~1’,cardinal(n-v)),
                            ‘~2’,cardinal(v));

                    end
if;
                end
if;
            end
if;
        end loop;
        return ‘NUMBER TOO
LARGE’;
    end cardinal;
end genword;
/
show errors;

Finally, here is some data I gathered for
English and German. I also copy the data from American English to
British English and use the terms “thousand million” and “million
million” instead of “billion” and “trillion” (in the American
sense), which are often a source of confusion outside the United
States. This is enough data to generate spelled-out versions of
whole integers between -999,999,999,999 and 999,999,999,999,
including zero.

REM — American English
insert into numwords values (‘US’,-1,’negative’);
insert into numwords values (‘US’,0,’zero’);
insert into numwords values (‘US’,1,’one’);
insert into numwords values (‘US’,2,’two’);
insert into numwords values (‘US’,3,’three’);
insert into numwords values (‘US’,4,’four’);
insert into numwords values (‘US’,5,’five’);
insert into numwords values (‘US’,6,’six’);
insert into numwords values (‘US’,7,’seven’);
insert into numwords values (‘US’,8,’eight’);
insert into numwords values (‘US’,9,’nine’);
insert into numwords values (‘US’,10,’ten’);
insert into numwords values (‘US’,11,’eleven’);
insert into numwords values (‘US’,12,’twelve’);
insert into numwords values (‘US’,13,’thirteen’);
insert into numwords values (‘US’,15,’fifteen’);
insert into numwords values (‘US’,18,’eighteen’);
insert into numwords values (‘US’,20,’twenty’);
insert into numwords values (‘US’,30,’thirty’);
insert into numwords values (‘US’,40,’forty’);
insert into numwords values (‘US’,50,’fifty’);
insert into numwords values (‘US’,80,’eighty’);
insert into numwords select ‘GB’,num,word from numwords where lang
= ‘US’;

insert into numrules values (‘US’,0,1,1,’~2teen’,null);
insert into numrules values (‘US’,1,1,1,’~1ty’,’~1-~2′);
insert into numrules values (‘US’,2,1,2,’~1 hundred’,null);
insert into numrules values (‘US’,3,3,3,’~1 thousand’,null);
insert into numrules values (‘US’,4,3,6,’~1 million’,null);
insert into numrules select ‘GB’,seq,p1,p2,temp0,temp
    from numrules where lang = ‘US’;
insert into numrules values (‘US’,5,3,9,’~1 billion’,null);
insert into numrules values (‘GB’,5,3,9,’~1 thousand
million’,null);
insert into numrules values (‘US’,6,3,12,’~1
trillion’,null);
insert into numrules values (‘GB’,6,3,12,’~1 million
million’,null);

REM – German
insert into numwords values (‘D’,-1,’negativ’);
insert into numwords values (‘D’,0,’null’);
insert into numwords values (‘D’,1,’eins’);
insert into numwords values (‘D’,2,’zwei’);
insert into numwords values (‘D’,3,’drei’);
insert into numwords values (‘D’,4,’vier’);
insert into numwords values (‘D’,5,unistr(‘f\00FCnf’));
insert into numwords values (‘D’,6,’sechs’);
insert into numwords values (‘D’,7,’sieben’);
insert into numwords values (‘D’,8,’acht’);
insert into numwords values (‘D’,9,’neun’);
insert into numwords values (‘D’,10,’zehn’);
insert into numwords values (‘D’,11,’elf’);
insert into numwords values (‘D’,12,unistr(‘zw\00F6lf’));
insert into numwords values (‘D’,13,’dreizehn’);
insert into numwords values (‘D’,16,’sechzehn’);
insert into numwords values (‘D’,17,’siebzehn’);
insert into numwords values (‘D’,20,’zwanzig’);
insert into numwords values (‘D’,21,’einundzwanzig’);
insert into numwords values (‘D’,30,unistr(‘drei\00DFig’));
insert into numwords values
(‘D’,31,unistr(‘einunddrei\00DFig’));
insert into numwords values (‘D’,41,’einundvierzig’);
insert into numwords values
(‘D’,51,unistr(‘einundf\00FCnfzig’));
insert into numwords values (‘D’,60,’sechzig’);
insert into numwords values (‘D’,70,’siebzig’);
insert into numwords values (‘D’,100,’hundert’);
insert into numwords values (‘D’,1000,’tausend’);
insert into numwords values (‘D’,1e6,’eine Million’);
insert into numwords values (‘D’,1e9,’eine Milliarde’);
insert into numwords values (‘D’,1e12,’eine Billion’);

insert into numrules values (‘D’,0,1,1,’~2zehn’,null);
insert into numrules values (‘D’,1,1,1,’~1zig’,’~2und~1′);
insert into numrules values (‘D’,2,1,2,’~1hundert’,’~1~2′);
insert into numrules values (‘D’,3,3,3,’~1tausend’,’~1 und
~2′);
insert into numrules values (‘D’,4,3,6,’~1 Millionen’,null);
insert into numrules values (‘D’,5,3,9,’~1
Milliarden’,null);
insert into numrules values (‘D’,6,3,12,’~1 Billionen’,null);

Here are some simple SQL statements that use
the functions and the above data. You can try setting the language
to ‘GERMAN’, or ‘ENGLISH’ to try out the other two sets of
data:

SQL> alter session set nls_language =
‘AMERICAN’;
SQL> select genword.cardinal(123456789) from dual;

one hundred twenty-three million four hundred fifty-six thousand
seven hundred
 eighty-nine

TechRepublic’s Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!