SHARE

Understand the effects of NULLs on inner and outer JOINs

In this SQL Server tip, Arthur Fuller explains the effects of NULLs, and the different results obtained by inner and outer JOINs.

Written By

arthurfuller

Nov 27, 2006

We may earn from vendors via affiliate links or sponsorships. This might affect product placement on our site, but not the content of our reviews. See our Terms of Use for details.

SQL Server developers must understand two things: the
difference between inner and outer JOINs and the
effect of NULLs upon these choices. This tip
addresses both subjects.

Get SQL tips in your inbox TechRepublic’s SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system.

Automatically sign up today!

Let’s take it one step at a time — first, the differences
between inner and outer JOINs, and then the
implications of NULLs on our queries. In order to
examine the difference between inner and outer JOINs,
we’ll need some sample data.

Suppose you have two tables, T1 and T2, each of which contain a column called TestJoin.
The column contains the values 1, 2, and 3 in table T1, while the column contains NULL,
2, and 3 in
table T2. Inner JOINing these tables on the column TestJoin returns two rows, since you cannot join the value
1 to the NULL. An outer JOIN is the opposite. For instance, the following yields only those rows where
Customers have placed Orders.

SELECT CustomerID, OrderID FROM Customers Inner Join Orders On Customers.CustomerID = Orders.CustomerID

(In many businesses, this may be valid; other
businesses may define a Customer as one who has placed at least one Order. That
is a business rule that ought to be reflected in the database definition, not
the middle tier, in my opinion, though some would differ.)

An outer join would read:

SELECT CustomerID, OrderID FROM Customers LEFT OUTER JOIN Orders OnOrders.CustomerID = Customers.CustomerID

which would result in a list of all
customers, irrespective of whether they placed orders.

Run the simple script in
Listing A to create some test tables.

Here is the SQL to compare the inner and outer JOINs:

SELECT  InnerOuter.T1.T1ID, InnerOuter.T1.NameAS Name1,         InnerOuter.T2.T2ID, InnerOuter.T2.NameAS Name2FROM    InnerOuter.T1 LEFT OUTER JOIN               InnerOuter.T2 ON InnerOuter.T1.T1ID = InnerOuter.T2.T1ID

When you run both queries, you will confirm that the inner
JOIN returns two rows, while the outer JOIN returns three. This principle holds
true even if you add a third table, as you can see from Listing B.

If you join the third table in the SELECT statements, you
find the same results: two rows from the inner JOIN and three from the outer
JOIN. See Listing C.

Be aware of the effect of NULLs on
aggregate functions. To observe this effect, add a NULL money column to T2
called Amount. Add a few more rows to T2 something like this:

T2ID  T1ID  Name  Amount
1     1     T2Text1     NULL
2     NULL  T2Text2     NULL
3     3     T2Text3     120.0000
4     1     T2Text4     123.0000
5     1     T2Text5     234.0000
6     3     T2Text6     345.0000
NULL  NULL  NULL  NULL

The T1ID value 1 has three rows, one of whose Amount is
NULL. The T2ID value 3 has two rows, neither of which are NULL. (You can add as
many rows as you wish, but ensure that there are different numbers of rows with
NULL values in the Amount column.)

Now run this query:

SELECT  InnerOuter.T1.T1ID,         Sum(InnerOuter.T2.Amount)AS TotalAmount,         Count(InnerOuter.T2.T1ID)AS NumberOfRows,        Count(InnerOuter.T2.Amount)AS NumberOfAmounts,        Avg(InnerOuter.T2.Amount)AS AverageAmount FROM    InnerOuter.T1 LEFT OUTER JOIN        InnerOuter.T2 ON InnerOuter.T1.T1ID = InnerOuter.T2.T1IDGROUP BY         InnerOuter.T1.T1ID

The above data produces this result set:

1     357.00      3     2     178.50
2     NULL  0     0     NULL
3     465.00      2     2     232.50

The first returned row indicates that if you count the T1ID
column, you get the correct number of rows, but if you count the Amounts, you
get the number of non-NULL values. Note that the averages reflect the number of
non-NULL values, not the number of rows.

You might think these examples are artificial, since given
the relationships between the tables, it would seem
unlikely that the foreign key columns allow NULLs.
However, I see this sort of thing all the time in databases. HR hires a new
employee but has yet to assign her to a department. Or, a new customer is added
to a database before a sales rep is assigned to that
customer.

There are various ways to work around the absence of such
information. In my opinion, the worst approach is to add what is known
colloquially as the 0^th row, in which a “fake” row is
added to every lookup table, and this zero value is used whenever a foreign key
is unknown—thus, the foreign keys will never be NULL. This approach has two
significant problems: (1) It conceptually falsifies reality by introducing a new
type of data into the table, when relational theory insists that a table holds
exactly one type of data; (2) It complicates queries because you will always
have to eliminate such rows from standard queries.

Miss a tip?

Check out the SQL Server archive, and catch up on the most recent editions of Arthur Fuller’s column.