Banking

Query data faster using sorted hash clusters

When you have a large number of keys, you'll start to see performance problems because now you have many cluster blocks. One way to get around this is by providing a hash function to restrict the number of cluster blocks.

When data is stored in a normal table, the rows are physically stored in allocated blocks in the order in which you insert them into the database. For example, if you have a table of information about employees, the employees' names would typically be stored in the table in the order in which they were added to the table.

If you have a large number of employees, the table would gradually get slower. You could speed up employee queries if you choose a column that gives a relatively equal distribution of values, such as the department number of the employee, and if you create a cluster table.

In a cluster table, if the employees are in the same department, the rows would physically be stored in the same set of blocks. This makes queries for employees faster since it requires fewer database block reads to retrieve the employees for a specific department. In the non-clustered table, you might have to read every database block to find all the employees.

When you have a large number of keys, you'll start to see performance problems because now you have many cluster blocks. One way to get around this is by providing a hash function to restrict the number of cluster blocks. A hash function takes a numerical value and restricts the range to a predetermined number, but it provides a relatively equal distribution of values. For example, you might create a hash function on the department number that only looks at the last two digits.

One problem with hash functions is that the function value tends to randomize the order that rows are naturally returned. You can usually fix this with an ORDER BY; however, there are cases in which there are a large number of records. Oracle 10g fixes this problem by allowing you to define a "natural order" to data so you can retrieve hash cluster data in the desired order without sorting.

For example, suppose you maintain a database of credit card transactions. You decide that using the credit card number as a cluster key will give you a good distribution of data. But, because there are a large number of credit cards, you use a hash function to restrict the number of cluster blocks. Since you want your data to come back in chronological order for most of your reports, use a sorted hash cluster rather than using ORDER BY in every query. Here's the syntax:

create cluster credit_cluster
(
    card_no     varchar2(16),
    transdate   date sort
)
hashkeys 10000 hash is ora_hash(card_no)
size 256;

create table credit_orders
(
    card_no     varchar2(16),
    transdate   date,
    amount      number
)
cluster credit_cluster(card_no,transdate);

alter session set nls_date_format = "YYYYMMDDHH24MISS";
insert into credit_orders (card_no,transdate,amount)
    values ('4111111111111111','20050131000123',57.99);
insert into credit_orders (card_no,transdate,amount)
    values ('4111111111111111','20050130071216',16.59);
insert into credit_orders (card_no,transdate,amount)
    values ('4111111111111111','20050131111111',39.00);
insert into credit_orders (card_no,transdate,amount)
    values ('4111111111111111','20050130081001',25.16);

Notice that I use the new function ORA_HASH to create a numeric hash value for the credit card. Now, you can simply query the data for a single credit card, and it automatically comes back in sorted order, as shown in Listing A.

TechRepublic's Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!

Editor's Picks

Free Newsletters, In your Inbox