Part one discussed using SQL to solve directed graph problems. Another kind of directed graph is a state machine. In a state
machine, each node is called a state and represents the current
status of a system, object, or variable. There are vectors between
each state that represent a change in state caused by an external
event.

For example, Windows event-based architecture
or JavaScript
events are designed around the idea that a GUI contains a set of
objects that respond to various user-input or system-generated
events.


Even a programming language compiler is a state machine. A compiler
recognizes tokens by shifting between different states until a
token is “recognized.” A popular utility, lex, was often used to
take a set of tokens, or reserved words, and convert it and
optimize it into a state machine in C. You can duplicate this same
process in SQL, with the advantage that the generated code can be
in any language.

First, create a table of tokens:

create table tokens
(
    id$         integer
constraint tokens_pk primary key,
   
token       varchar2(20)
);

A state machine needs to store a set of states
and a set of transitions between states. For this example, a single
input character will represent each transition:

create table states
(
    id$         integer
constraint states_pk primary key,
    accept#     integer,

    code        varchar2(20),

    constraint  states_accept_fk foreign
key (accept#) references tokens(id$),
    constraint  states_accept_uq unique
(accept#)
);

create table trans
(
    state$#     integer
not null,
    input$      char
not null,
    next#       integer,

    constraint  trans_pk primary key
(state$#,input$),
    constraint  trans_state_fk foreign
key (state$#) references states(id$),
    constraint  trans_next_fk foreign key
(next#) references states(id$)
);

The problem is to build up these tables of
states and transitions based on the set of tokens. As in the
previous directed graph problem, we’ll start with some initial node
(in this case, the initial state) and try to find all the routes
between different states. Each route is the set of characters
recognized so far. For example:

state 0: initial state, empty string
state 1: ‘A’ state
state 2: ‘AA’ state
state 3: ‘AB’ state
state 4: ‘ABC’ state

You can populate the transition table by
associating the letter (or regular expression in a more realistic
program) with the state we’re currently in and the state at which
we’ll arrive. Let’s populate and optimize the table with this
code:

Declare
    l_state pls_integer := 0;
    l_len pls_integer := 1;
begin
    — populate states
    insert into states (id$) values
(l_state);
    loop
        insert into
states(id$,code)
            select
l_state+rownum,code
              from

              (

                  select
distinct substr(token,1,l_len) code
                    from
tokens
                   where
length(token) >= l_len
              );

        exit when
sql%rowcount = 0;
        l_state := l_state
+ sql%rowcount;
        l_len := l_len +
1;
    end loop;
    — populate transitions
    insert into trans
(state$#,input$,next#)
        select
s1.id$,substr(s2.code,-1,1),s2.id$
          from
states s1,states s2
         where
(s1.code = substr(s2.code,1,length(s2.code)-1))
            or
(s1.id$ = 0 and length(s2.code) = 1);
    — map accept states to tokens
    update states
       set accept# = (select
id$ from tokens where token = states.code);
end;
/
show errors;

The first statement inserts the initial state.
Then, for cutting up the tokens into different lengths, we can
enumerate all the states. Next, we populate the transition table
with the character required to get from one state to the next.
Finally, we attach the token id to the final state to indicate that
this state represents a complete token. For example, if we add
three tokens, where 1 = AAA, 2 = ABC, 3 = ASC, then running the
above code will result in data, which you can see in Table A.

You can use this table to generate code in many
languages that process characters from strings efficiently. The
only code that gets used in a recognition program is the “accept”
column for states, and the input and next state columns from the
transition table. For example, in C:

#include <stdio.h>

typedef struct state state;
typedef struct trans trans;

struct state
{
    int
accept;               /*
token value or 0 */
    int
offset;               /*
offset into transition table */
    int count;
};
struct trans
{
    char
input;               /*
input character */
    int   
next;              /*
next state */
};

state statetbl[8] =
{
    {0,0,1},                  /*
0: initial state */
    {0,1,3},                  /*
1: ‘A’ state */
    {0,4,1},                  /*
2: ‘AA’ state */
    {0,5,1},                  /*
3: ‘AB’ state */
    {0,6,1},                  /*
4: ‘AS’ state */
    {1,7,0},                  /*
5: ‘AAA’ state */
    {2,7,0},                  /*
6: ‘ABC’ state */
    {3,7,0},                  /*
7: ‘ASC’ state */
};
trans trantbl[7] =
{
   {‘A’,1},                   /*
0: initial -> ‘A’ */
   {‘A’,2},                   /*
1: ‘A’ -> ‘AA’ */
   {‘B’,3},                   /*
2: ‘A’ -> ‘AB’ */
   {‘S’,4},                   /*
3: ‘A’ -> ‘AS’ */
   {‘A’,5},                   /*
4: ‘AA’ -> ‘AAA’ */
   {‘C’,6},                   /*
5: ‘AB’ -> ‘ABC’ */
   {‘C’,7},                   /*
6: ‘AS’ -> ‘ASC’ */
};

/* given a current state and input character, return the next
state */
int move(int state,char input)
{
    int i;
    for (i =
0;i<statetbl[state].count;i++)
    {
        if
(trantbl[statetbl[state].offset+i].input == input)
           return
trantbl[statetbl[state].offset+i].next;
    }
    return
-1;      /* unknown transition
*/
}

int main(int argc,char* argv[])
{
    int i,next,state;
    if (argc < 2)
    {
        fprintf(stderr,”Usage:
%s \”tokens…\”\n”,argv[0]);
        return 1;
    }

    /* parse the first argument */
    state = 0;
    for (i=0;argv[1][i];i++)
    {
        next =
move(state,argv[1][i]);
        if (next == -1)
state = 0;
        else
        {
            state
= next;
            if
(statetbl[state].accept)
                printf(“recognized
token#%d\n”,statetbl[state].accept);
        }
    }
    return 0;
}

Then, you can run this program and get tokens
recognized on the command line:

$ cc -o token token.c

$ token  “ABC ASC FEF ASC”
recognized token#2
recognized token#3
recognized token#3

TechRepublic’s Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!