Data Management

Kick it up a notch! Faster Oracle queries using parallel features

Oracle's parallel features give you the power to turbocharge your queries. Donald Burleson explains how.


Oracle version 7.2 introduced some powerful new features to allow parallel processes to be used against the Oracle database. These features include parallel create table as select, parallel query, and parallel index building. Remember that you don't need to have parallel processors (SMP or MPP) in order to use and benefit from parallel processing. Even on the same processor, multiple processes can speed up queries. Oracle parallel query option can be used with any SQL SELECT statement, as long as the query performs a full-table scan on the target table.

Parallel queries are most useful in distributed databases where a single logical table has been partitioned into smaller tables at each remote node. For example, a customer table that is ordered by customer name may be partitioned into a customer table at each remote database, such that we have a phoenix_customer, a los_angeles_customer, and so on. This approach is very common with distributed databases where local autonomy of processing is important. However, what about the needs of the corporate headquarters? How can they query all of these remote tables as a single unit and treat the logical customer table as a single entity?

Did you miss the first intallment?
If you didn't catch Don Burleson’s article describing the benefits of parallel queries in Oracle databases, you can check it out here.

While this “splitting” of a table according to a key value violates normalization, it can dramatically improve performance for individual queries. For large queries that may span many logical tables, the isolated tables can then easily be reassembled using Oracle’s parallel query facility:
Create view all_customer as
select * from phoenix_customer@phoenix
UNION ALL
select * from los_angeles_customer@los_angeles
UNION ALL
select * from rochester_customer@rochester;
 

Note: The “@” references refer to SQL*Net service names for the remote hosts.
We can now query the all_customer view as if it were a single database table, and Oracle parallel query will automatically recognize the UNION ALL parameter and fire off simultaneous queries against each of the three base tables. It is important to note that the distributed database manager will direct that each query is processed at the remote location, while the query manager waits until each remote node has returned its result set. For example, the following query will assemble the requested data from the three tables in parallel, with each query being separately optimized. The result set from each sub-query is then merged by the query manager:
select customer_name
from all_customer
where
total_purchases > 5000;



Data warehouses generally employ parallel technology to perform warehouse loading and query functions. These queries include:
  1. Parallel backup/recovery—Some parallel tools are capable of rates in excess of 40 BG/hour.
  2. Parallel query (SMP & MPP)—Multiple processes are used to retrieve table data.
  3. Parallel loading—Multiple processes are used to simultaneously load many tables.
  4. Parallel indexing—Multiple processes are used to create indexes.

For parallel query, the most powerful approach deals with the use of the SQL UNION verb in very large databases (VLDBs). In most large Oracle data warehouses, it is not uncommon to logically partition a single table into many smaller tables in order to improve query throughput. For example, a sales table that is ordered by date_of_sale may be partitioned into 1997_sales, 1998_sales, and 1999_sales tables. This approach is very common with data warehouse applications in which single logical tables might have millions of rows. While this “splitting” of a table according to a key value violates normalization, it can dramatically improve performance for individual queries. For large queries that may span many logical tables, the isolated tables can then be easily reassembled using Oracle’s parallel query facility:
 
Create view all_sales as
       select * from 1997_sales
       UNION ALL
       select * from 1998_sales
       UNION ALL
       select * from 1999_sales;
 

We can now query the all_sales view as if it were a single database table, and Oracle parallel query will automatically recognize the UNION ALL parameter. It will fire off simultaneous queries against each of the three base tables. For example, the following query will assemble the requested data from the three tables in parallel, with each query being separately optimized. The result set from each sub-query is then merged by the query manager:



select customer_name
from all_sales
where
sales_amount > 5000;



Don Burleson is a leading database author with expertise in data warehouse technology and tuning. He has more than 15 years' experience as a database administrator, working with some of the world’s most sophisticated Oracle systems. Don runs a Web site and can be reached by e-mail at burleson@frontiernet.net.

Editor's Picks

Free Newsletters, In your Inbox