On the surface, it may appear that the SQL
clauses IN and EXISTS are interchangeable. However, they’re quite
different in how they handle NULL values and may give different
results. The problem comes from the fact that, in an Oracle
database, a NULL value means unknown, so any comparison or
operation against a NULL value is also NULL, and any test that
returns NULL is always ignored. For example, neither one of these
queries return any rows:

select ‘true’ from dual where 1 = null;
select ‘true’ from dual where 1 != null;

The value 1 is neither equal nor not equal to
NULL. Only IS NULL would return true on a NULL value and return a
row.

select ‘true’ from dual where 1 is null;
select ‘true’ from dual where null is null;

When you use IN, you’re telling SQL to take a
value and compare it against every value or set of values in a list
using =. If any NULL values exist, a row will not be returned–even
if both values are NULL.

select ‘true’ from dual where null in (null);
select ‘true’ from dual where (null,null) in ((null,null));
select ‘true’ from dual where (1,null) in ((1,null));

An IN is functionally equivalent to the = ANY
clause:

select ‘true’ from dual where null = ANY
(null);
select ‘true’ from dual where (null,null) = ANY
((null,null));
select ‘true’ from dual where (1,null) = ANY ((1,null));

When you use an equivalent form of EXISTS, SQL
counts rows and ignores the value(s) in the subquery–even if you
return NULL.

select ‘true’ from dual where exists (select null
from dual);
select ‘true’ from dual where exists (select 0 from dual where
null is null);

The IN and EXISTS are logically the same. The
IN clause compares values returned by the subquery and filters out
rows in the outer query; the EXISTS clause compares values and
filters out rows inside the subquery. In the case of NULL values,
the resulting set of rows is the same.

select ename from emp where empno in (select mgr
from emp);
select ename from emp e where exists (select 0 from emp where mgr
= e.empno);

But problems arise when the logic is reversed
to use NOT IN and NOT EXISTS, which return different sets of rows
(the first query returns 0 rows; the second returns the intended
data–they aren’t the same query):

select ename from emp where empno not in (select
mgr from emp);
select ename from emp e where not exists (select 0 from emp where
mgr =
 e.empno);

The NOT IN clause is virtually equivalent to
comparing each value with = and failing if any test is FALSE or
NULL. For example:

select ‘true’ from dual where 1 not in
(null,2);
select ‘true’ from dual where 1 != null and 1 != 2;
select ‘true’ from dual where (1,2) not in ((2,3),(2,null));
select ‘true’ from dual where (1,null) not in ((1,2),(2,3));

These queries don’t return any rows. The second
is more obvious, 1 != NULL is NULL, so the whole WHERE condition is
false for that row. While these would work:

select ‘true’ from dual where 1 not in (2,3);
select ‘true’ from dual where 1 != 2 and 1 != 3;

You can still use the NOT IN query from before,
as long as you prevent NULL from being returned in the results
(again, these both work, but I’m assuming empno is not null, which
is a good assumption in this case):

select ename from emp where empno not in (select
mgr from emp where mgr is not
 null);
select ename from emp where empno not in (select nvl(mgr,0) from
emp);

By understanding the difference between IN,
EXISTS, NOT IN, and NOT EXISTS, you can avoid a very common problem
when NULLs appear in the data of a subquery.

Scott Stephens worked for Oracle for more than 13 years in technical support, e-commerce, marketing, and software development.

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays