Data Management

Oracle Tip: Solving directed graph problems with SQL, part 1

There are many algorithms and papers about how to solve problems of traversing along each possible route and finding the route that is the shortest or costs the least. Most of these algorithms are procedural or resort to recursion to solve. However, SQL's declarative language makes solving complex directed graph problems easier.

A common advanced computer science problem can be described under the heading of "directed graphs." A directed graph is a finite set of nodes connected by a set of vectors or edges. For example, a node can be imagined as a "city" and each vector as a "flight" between two cities.

There are many algorithms and papers about how to solve problems of traversing along each possible route and finding the route that is the shortest or costs the least. Most of these algorithms are procedural or resort to recursion to solve. However, SQL's declarative language makes solving complex directed graph problems easier and doesn't require much coding.

Let's take the example of flights between cities and create a table to hold some imaginary data:

create table airports
(
    code        char(3) constraint airports_pk primary key,
    description varchar2(200)
);

insert into airports values ('LHR','London Heathrow, UK');
insert into airports values ('JFK','New York-Kennedy, USA');
insert into airports values ('GRU','Sao Paulo, Brazil');

create table fares
(
    depart      char(3),
    arrive      char(3),
    price       number,
    constraint fares_pk primary key (depart,arrive),
    constraint fares_depart_fk foreign key (depart) references airports,
    constraint fares_arrive_fk foreign key (arrive) references airports
);

insert into fares values('LHR','JFK',700);
insert into fares values('JFK','GRU',600);
insert into fares values('LHR','GRU',1500);
insert into fares values('GRU','LHR',1600);

You cannot use CONNECT BY syntax to resolve how to get from London to Sao Paulo because there is data that creates a loop in the graph (flying back to Sao Paulo):

select * from fares connect by prior arrive = depart start with depart = 'LHR';
ERROR:
ORA-01436: CONNECT BY loop in user data

To solve directed graph problems, we need to create a temporary table to hold all the possible paths between two nodes. We have to be careful not to duplicate paths we already processed and, in this case, we don't want paths that lead to the same place we started. I'd also like to trace the number of hops it took to get to the destination, as well as a description of the route taken.

The temporary table is created using the following:

create global temporary table faretemp
(
    depart      char(3),
    arrive      char(3),
    hops        integer,
    route       varchar2(30),
    price       number,
    constraint faretemp_pk primary key (depart,arrive)
);

A simple view can somewhat simplify the code used in this example. The view can calculate the data for the next hop from a path in faretemp based on a single hop in fares:

create or replace view nexthop
as
    select src.depart,
           dst.arrive,
           src.hops+1 hops,
           src.route||','||dst.arrive route,
           src.price + dst.price price
      from faretemp src,fares dst
     where src.arrive = dst.depart
       and dst.arrive != src.depart;
/
show errors;

The algorithm is fairly straightforward. First, populate the faretemp table with the data from the fare table as the initial hop. Then, take all the values we just inserted and use them to build all the possible two-hop paths. Keep repeating as long as a new path can be created between two nodes. The loop will exit when all possible paths between nodes are described. We could also restrict the first insert to reduce the amount of data being loaded if we were only interested in a certain starting condition. Here is the code for just discovering paths:

truncate table faretemp;
begin
    — initial connections
    insert into faretemp
     select depart,arrive,1,depart||','||arrive,price from fares;
    while sql%rowcount > 0 loop
        insert into faretemp
            select depart,arrive,hops,route,price from nexthop
             where (depart,arrive)
                   not in (select depart,arrive from faretemp);
    end loop;
end;
/
show errors;

select * from faretemp order by depart,arrive;

You can view the output in Table A.

There's one small problem with the above data. The data is the set of shortest routes between points (least number of hops). However, the hop from London to Sao Paulo isn't the cheapest possibility.

To solve for the least expensive fare, we need to add an update into our loop that replaces a route with a cheaper route when it's found in a hop. The code for that looks like this:

truncate table faretemp;
declare
    l_count integer;
begin
    — initial connections
    insert into faretemp
        select depart,arrive,1,depart||','||arrive,price from fares;
    l_count := sql%rowcount;
    while l_count > 0 loop
        update faretemp
           set (hops,route,price) =
              (select hops,route,price from nexthop
                where depart = faretemp.depart
                  and arrive = faretemp.arrive)
         where (depart,arrive) in
             (select depart,arrive from nexthop
               where price < faretemp.price);
        l_count := sql%rowcount;
        insert into faretemp
            select depart,arrive,hops,route,price from nexthop
             where (depart,arrive)
               not in (select depart,arrive from faretemp);
        l_count := l_count + sql%rowcount;
    end loop;
end;
/
show errors;

select * from faretemp order by depart,arrive;

You can view the output in Table B.

The algorithm noticed that the LHR,JFK,GRU route was cheaper than the LHR,GRU route and replaced it. The loop will exit once there are no cheaper fares and no other possible routes.

TechRepublic's Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!

Editor's Picks