When data is stored in a normal table, the
rows are physically stored in allocated blocks in the order in
which you insert them into the database. For example, if you have a
table of information about employees, the employees’ names would
typically be stored in the table in the order in which they were
added to the table.

If you have a large number of employees, the
table would gradually get slower. You could speed up employee
queries if you choose a column that gives a relatively equal
distribution of values, such as the department number of the
employee, and if you create a cluster table.

In a cluster table, if the employees are in the
same department, the rows would physically be stored in the same
set of blocks. This makes queries for employees faster since it
requires fewer database block reads to retrieve the employees for a
specific department. In the non-clustered table, you might have to
read every database block to find all the employees.

When you have a large number of keys, you’ll
start to see performance problems because now you have many cluster
blocks. One way to get around this is by providing a hash function
to restrict the number of cluster blocks. A hash function takes a
numerical value and restricts the range to a predetermined number,
but it provides a relatively equal distribution of values. For
example, you might create a hash function on the department number
that only looks at the last two digits.

One problem with hash functions is that the
function value tends to randomize the order that rows are naturally
returned. You can usually fix this with an ORDER BY; however, there
are cases in which there are a large number of records. Oracle 10g fixes this problem by
allowing you to define a “natural order” to data so you can
retrieve hash cluster data in the desired order without
sorting.

For example, suppose you maintain a database of
credit card transactions. You decide that using the credit card
number as a cluster key will give you a good distribution of data.
But, because there are a large number of credit cards, you use a
hash function to restrict the number of cluster blocks. Since you
want your data to come back in chronological order for most of your
reports, use a sorted hash cluster rather than using ORDER BY in
every query. Here’s the syntax:

create cluster credit_cluster
(
    card_no     varchar2(16),

    transdate   date sort
)
hashkeys 10000 hash is ora_hash(card_no)
size 256;

create table credit_orders
(
    card_no     varchar2(16),

    transdate   date,
    amount      number

)
cluster credit_cluster(card_no,transdate);

alter session set nls_date_format = “YYYYMMDDHH24MISS”;
insert into credit_orders (card_no,transdate,amount)
    values
(‘4111111111111111′,’20050131000123’,57.99);
insert into credit_orders (card_no,transdate,amount)
    values
(‘4111111111111111′,’20050130071216’,16.59);
insert into credit_orders (card_no,transdate,amount)
    values
(‘4111111111111111′,’20050131111111’,39.00);
insert into credit_orders (card_no,transdate,amount)
    values
(‘4111111111111111′,’20050130081001’,25.16);

Notice that I use the new function ORA_HASH to
create a numeric hash value for the credit card. Now, you can
simply query the data for a single credit card, and it
automatically comes back in sorted order, as shown in Listing A.

TechRepublic’s Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!