Data Management

Understand the ambiguities of compound JOINs in SQL

Arthur Fuller explains why he wants developers to think about this general principle: Compound JOINs become nonsensical when one of the clauses references only one of the tables.

This article begins with a test. Look at the SQL below and determine, without using your computer, what the result set will look like.

This code uses the sample Northwind database, and to save you the trouble of looking them up, I'll remind you of the following statistics: the Customers table has 91 rows; the Orders table has 830 rows; seven Orders contain the CustomerID 'WOLZA'.

SELECT dbo.Customers.CustomerID, dbo.Orders.OrderID
FROM dbo.Customers LEFT OUTER JOIN dbo.Orders
ON dbo.Customers.CustomerID = dbo.Orders.CustomerID AND
dbo.Customers.CustomerID = N'WOLZA'

How many rows would you expect this code to return? 7? 91? If you guessed either of these, you're incorrect. The query returns 97 rows.

Now try this one:

SELECT dbo.Customers.CustomerID, dbo.Orders.OrderID
FROM dbo.Customers RIGHT OUTER JOIN dbo.Orders ON
dbo.Customers.CustomerID = dbo.Orders.CustomerID AND
dbo.Customers.CustomerID = N'WOLZA'

This query returns 830 rows, one for each row in the Orders table. All but seven contain a NULL CustomerID. To put it another way, the AND clause has no effect.

Looking at the SQL above, you might surmise that the AND clause is testing the wrong table's CustomerID column.

If you run the following query, you'll get exactly the same result—830 rows, seven of which contain the CustomerID 'WOLZA'. Thus, it makes no difference which table you specify in the AND clause.

SELECT dbo.Customers.CustomerID, dbo.Orders.OrderID
FROM dbo.Customers RIGHT OUTER JOIN dbo.Orders ON
dbo.Customers.CustomerID = dbo.Orders.CustomerID AND
dbo.Orderss.CustomerID = N'WOLZA'

To get rid of the 823 rows containing a NULL, you could add a WHERE clause, like this:

SELECT dbo.Customers.CompanyName, dbo.Orders.OrderID, dbo.Orders.CustomerID
FROM dbo.Customers RIGHT OUTER JOIN dbo.Orders ON
dbo.Customers.CustomerID = dbo.Orders.CustomerID AND
dbo.Orders.CustomerID = N'WOLZA'
WHERE (dbo.Customers.CustomerID IS NOT NULL)

We finally get exactly seven rows. In fact, it turns out that there's another way to achieve the same thing: by adding a WHERE clause to test the CustomerID column:

SELECT dbo.Customers.CompanyName, dbo.Orders.OrderID, dbo.Orders.CustomerID
FROM dbo.Customers RIGHT OUTER JOIN dbo.Orders ON
dbo.Customers.CustomerID = dbo.Orders.CustomerID AND
dbo.Orders.CustomerID = N'WOLZA'
WHERE (dbo.Orders.CustomerID = N'WOLZA')

And yet, if all we were after in the first place was the seven orders corresponding to 'WOLZA', we could have obtained them with a simple INNER JOIN.

SELECT dbo.Customers.CompanyName, dbo.Customers.CustomerID, dbo.Orders.OrderID
FROM dbo.Customers INNER JOIN dbo.Orders ON
dbo.Customers.CustomerID = dbo.Orders.CustomerID
WHERE (dbo.Customers.CustomerID = N'WOLZA')

The point isn't that you should use INNER JOINs to get the list of seven WOLZA orders—or that OUTER JOINs are pointless (we've all seen situations in which they deliver exactly what we require). The point is that compound JOINs become nonsensical when one of the clauses references only one of the tables. Think about that as a general principle.

TechRepublic's free SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system. Automatically subscribe today!

Editor's Picks

Free Newsletters, In your Inbox