When coding a SQL statement with tables in
master-detail relationships, it’s common to have to decide whether
to write the query using the WHERE EXISTS (. . .) clause or the
WHERE value IN (. . .) clause. You may resist using WHERE EXISTS
because it has the awkward syntax of returning a value, which you
always ignore.
However, there’s a difference when using
rule-based optimization. You can determine the performance of a
rule-based query by understanding which table is the driving table
and how many rows each part returns.
When you write a query using the IN clause,
you’re telling the rule-based optimizer that you want the inner
query to drive the outer query (think: IN = inside to outside). For
example, to query the 14-row EMP table for the direct reports to
the employee KING, you could write the following:
select ename from emp e
where mgr in (select empno from emp where
ename = ‘KING’);
Here’s the EXPLAIN PLAN for this query:
OBJECT OPERATION
———- —————————————-
SELECT
STATEMENT()
NESTED
LOOPS()
EMP TABLE
ACCESS(FULL)
EMP TABLE
ACCESS(BY INDEX ROWID)
PK_EMP INDEX(UNIQUE
SCAN)
This query is virtually equivalent to this:
select e1.ename from emp e1,(select empno from emp
where ename = ‘KING’) e2
where e1.mgr = e2.empno;
You can write the same query using EXISTS by
moving the outer query column to a subquery condition, like
this:
select ename from emp e
where exists (select 0 from emp where
e.mgr = empno and ename = ‘KING’);
When you write EXISTS in a where clause, you’re
telling the optimizer that you want the outer query to be run
first, using each value to fetch a value from the inner query
(think: EXISTS = outside to inside).
The EXPLAIN PLAN result for the query is:
OBJECT OPERATION
———- —————————————-
SELECT
STATEMENT()
FILTER()
EMP TABLE
ACCESS(FULL)
EMP TABLE
ACCESS(BY INDEX ROWID)
PK_EMP INDEX(UNIQUE
SCAN)
This is virtually similar to the PL/SQL
code:
set serveroutput on;
declare
l_count integer;
begin
for e in (select mgr,ename from emp)
loop
select count(*)
into l_count from emp
where e.mgr
= empno and ename = ‘KING’;
if l_count != 0
then
dbms_output.put_line(e.ename);
end if;
end loop;
end;
To determine which clause offers better
performance in rule-based optimization, consider how many rows the
inner query will return in comparison to the outer query. In many
cases, EXISTS is better because it requires you to specify a join
condition, which can invoke an INDEX scan. However, IN is often
better if the results of the subquery are very small. You usually
want to run the query that returns the smaller set of results
first.
Some people avoid the EXISTS clause because of
the requirement to return a result from the query–even though the
result is never used. Depending on personal style, people often use
‘X,’ 1, 0, or null. From looking at the EXPLAIN PLAN output, it
appears that the optimizer throws out whatever value you enter and
uses 0 all the time. Many developers get into the habit of always
entering some constant value.
If you want to run your own tests, or see other
examples, here are the two scripts I used:
REM — explain.sql – view plan from
PLAN_TABLE
set feedback off
set verify off
set pages 2000
column operation format a40
column object format a10
TTITLE * STATEMENT_ID = ‘&1’ *
select object_name object,
lpad(‘
‘,level-1)||operation||'(‘||options||’)’ operation
from plan_table
start with id = 0 and statement_id = ‘&1’
connect by prior id = parent_id and statement_id =
‘&1’;
REM — exists.sql – examples with EXPLAIN
PLAN
REM — IN vs. EXISTS
REM — if you don’t have a PLAN_TABLE, run …
REM — @?/rdbms/admin/xplan
alter session set optimizer_goal = rule;
truncate table plan_table;
REM — find direct reports to KING
explain plan set statement_id = ‘IN’ for
select ename from emp e
where mgr in (select empno from emp where
ename = ‘KING’);
explain plan set statement_id = ‘JOIN-IN’ for
select e1.ename from emp e1,(select empno from emp where ename =
‘KING’) e2
where e1.mgr = e2.empno;
explain plan set statement_id = ‘EXISTS’ for
select ename from emp e
where exists (select 0 from emp where
e.mgr = empno and ename = ‘KING’);
explain plan set statement_id = ‘=’ for
select ename from emp e
where mgr = (select empno from emp where
ename = ‘KING’);
explain plan set statement_id = ‘JOIN1’ for
select e1.ename from emp e1,emp e2
where e1.mgr = e2.empno
and e2.ename = ‘KING’;
REM — find employees with greater than average salaries
explain plan set statement_id = ‘>’ for
select ename from emp e where e.sal > (select avg(sal) from
emp);
explain plan set statement_id = ‘JOIN2’ for
select e1.ename from emp e1,(select avg(sal) sal from emp)
e2
where e1.sal > e2.sal;
@@explain IN
@@explain JOIN-IN
@@explain EXISTS
@@explain =
@@explain JOIN1
@@explain >
@@explain JOIN2
TechRepublic’s Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!