If you’re new to SQL, joins can be a daunting concept. There are many different types of joins and seemingly little information available that explains and compares them. But after you digest a few basic concepts, the practice of joins actually isn’t very difficult.

Let’s look at the fundamentals that should give you a solid, functional knowledge of how to use the JOIN statement. We won’t consider some more advanced JOIN statement concepts, such as aliasing, join order, conceptual joins, and join hints. While the descriptions I’ll present in this article relate directly to Microsoft SQL Server, they also apply to any SQL–92-compliant database.

The JOIN concept
JOIN is a query clause that can be used with the SELECT, UPDATE, and DELETE data query statements to simultaneously affect rows from multiple tables. There are several distinct types of JOIN statements that return different data result sets.

Joined tables must each include at least one field in both tables that contain comparable data. For example, if you want to join a Customer table and a Transaction table, they both must contain a common element, such as a CustomerID column, to serve as a key on which the data can be matched. Tables can be joined on multiple columns so long as the columns have the potential to supply matching information. Column names across tables don’t have to be the same, although for readability this standard is generally preferred.

When you do use like column names in multiple tables, you must use fully qualified column names. This is a “dot” notation that combines the names of tables and columns. For example, if I have two tables, Customer and Transaction, and they both contain the column CustomerID, I’d use the dot notation, as in Customer.CustomerID and Transaction.CustomerID, to let the database know which column from which table I’m referring.

Now that we’ve examined the basic theory, let’s take a look at the various types of joins and examples of each.

The basic JOIN statement
A basic JOIN statement has the following format:
SELECT Customer.CustomerID, TransID, TransAmt
FROM Customer JOIN Transaction
ON Customer.CustomerID = Transaction.CustomerID;

In practice, you’d never use the example above because the type of join is not specified. In this case, SQL Server assumes an INNER JOIN. You can get the equivalent to this query by using the statement:
SELECT Customer.CustomerID, TransID, TransAmt
FROM Customer, Transaction;

However, the example is useful to point out a few noteworthy concepts:

  • TransID and TransAmt do not require fully qualified names because they exist in only one of the tables. You can use fully qualified names for readability if you wish.
  • The Customer table is considered to be the “left” table because it was called first. Likewise, the Transaction table is the “right” table.
  • You can use more than two tables, in which case each one is “naturally” joined to the cumulative result in the order they are listed, unless controlled by other functionality such as “join hints” or parenthesis.
  • You may use WHERE and ORDER BY clauses with any JOIN statement to limit the scope of your results. Note that these clauses are applied to the results of your JOIN statement.
  • SQL Server does not recognize the semicolon (;), but I use it in the included examples to denote the end of each statement, as would be expected by most other RDBMSs.

The notorious CROSS JOIN
The CROSS JOIN has earned a bad reputation because it’s very resource intensive and returns results of questionable usefulness. When you use the CROSS JOIN, you’re given a result set containing every possible combination of the rows returned from each table. Take the following example:
SELECT CustomerName, TransDate, TransAmt
FROM Customer CROSS JOIN Transaction;

With the CROSS JOIN, you aren’t actually free to limit the results, but you can use the ORDER BY clause to control the way they are returned. If the tables joined in this example contained only five rows each, you would get 25 rows of results. Every CustomerName would be listed as associated with every TransDate and TransAmt.

I really did try to come up with examples where this function was useful, and they were all very contrived. However, I’m sure someone out there is generating lists of all their products in all possible colors or something similar, or we wouldn’t have this wonderful but dangerous feature.

The INNER JOIN drops rows
When you perform an INNER JOIN, only rows that match up are returned. Any time a row from either table doesn’t have corresponding values from the other table, it is disregarded. Because stray rows aren’t included, you don’t have any of the “left” and “right” nonsense to deal with and the order in which you present tables matters only if you have more than two to compare. Since this is a simple concept, here’s a simple example:
SELECT CustomerName, TransDate
FROM Customer INNER JOIN Transaction
ON Customer.CustomerID = Transaction.CustomerID;

If a row in the Transaction table contains a CustomerID that’s not listed in the Customer table, that row will not be returned as part of the result set. Likewise, if the Customer table has a CustomerID with no corresponding rows in the Transaction table, the row from the Customer table won’t be returned.

The OUTER JOIN can include mismatched rows
OUTER JOINs, sometimes called “complex joins,” aren’t actually complicated. They are so-called because SQL Server performs two functions for each OUTER JOIN.

The first function performed is an INNER JOIN. The second function includes the rows that the INNER JOIN would have dropped. Which rows are included depends on the type of OUTER JOIN that is used and the order the tables were presented.

There are three types of an OUTER JOIN: LEFT, RIGHT, and FULL. As you’ve probably guessed, the LEFT OUTER JOIN keeps the stray rows from the “left” table (the one listed first in your query statement). In the result set, columns from the other table that have no corresponding data are filled with NULL values. Similarly, the RIGHT OUTER JOIN keeps stray rows from the right table, filling columns from the left table with NULL values. The FULL OUTER JOIN keeps all stray rows as part of the result set. Here is your example:
SELECT CustomerName, TransDate, TransAmt
FROM Customer LEFT OUTER JOIN Transaction
ON Customer.CustomerID = Transaction.CustomerID;

Customer names that have no associated transactions will still be displayed. However, transactions with no corresponding customers will not, because we used a LEFT OUTER JOIN and the Customer table was listed first.

In SQL Server, the word OUTER is actually optional. The clauses LEFT JOIN, RIGHT JOIN, and FULL JOIN are equivalent to LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN, respectively.

Another addition to your SQL toolbox
Although the JOIN statement is often perceived as a complicated concept, you can now see that it’s a powerful timesaving resource that’s relatively easy to understand. Use this functionality to get related information from multiple tables with a single query and to skillfully reference normalized data. Once you’ve mastered JOINs, you can elegantly maneuver within even the most complex database.