Data Management

SQL JOINs make it easy to find and fix missing data

Many people are confused by various JOIN operations supported by SQL. Learn to master the SQL command line with this crash course in JOIN syntax.

In response to previous SQL lessons, TechRepublic members including Michael M. and Jason P. have posted comments requesting a basic explanation of the types of JOINs available in SQL.

If you use the SQL command line to mine the records in your tables, you’ll discover that a JOIN may be the best tool for finding the information you need. This week, I’ll explain the basic types of JOINs and how they work.

The INNER JOINS: Matching records only
In December 2001's "Another crash course in SQL," I showed you one way to extract information from two tables in one statement. In most implementations of SQL, you pull information from two tables by using a SELECT command in this form:
SELECT table1.column1, table1.column2,
table2.column1, table2.column2
FROM table1, table2
WHERE table1.keyfield=table2.keyfield

When you execute a statement like this, SQL creates a new table—the results recordset—based on the contents of table1 and table2. You specify which fields from the two tables you want to see in the results. But how many rows (records) will the new table contain?

The answer depends on the WHERE clause. In this sample statement, the condition (table1.keyfield=table2.keyfield) tells SQL to find records in both tables that contain matching values in the column named by keyfield. If one of the tables contains records that are orphaned—for which there is no corresponding record with a matching value in the other table—those records are ignored.

To illustrate this behavior, consider the two tables shown in Figure A.

Figure A
We’ll use these two tables to demonstrate how JOINs affect your queries.


Notice that the Donors table contains five records, each representing a different person. The DonationRecords table contains entries in the DonorID column for Jeff (DonorID 1), Kim (DonorID 2), and Angela (Donor ID3), as well as three entries for someone whose DonorID is 6.

Now, consider what happens when you execute the following SQL command:
SELECT Donors.DonorID,Donors.Donorname,
DonationRecords.DonationAmt
FROM Donors, DonationRecords
WHERE Donors.DonorID=DonationRecords.DonorID


Figure B shows the results of this query. Notice that, because the WHERE clause specifies that the entries in the DonorID fields must match for the records to hit the recordset, the donation records for the person whose DonorID is 6 are ignored.

Figure B
The results of our query ignored the donation records attributed to DonorID 6.


In an ideal setting, you wouldn’t allow orphaned records to get into your DonationRecords table. However, this example comes from a real-world case where my client expected me to clean up dirty or missing data. In this case, I audited the paperwork in the donations file to determine whose name should be associated with DonorID 6. When I figured that out, I entered that name into the Donors table.

The INNER JOIN version of the query
This kind of query, in which results contain only records that contain matching values in both tables, is known as an INNER JOIN. To use the INNER JOIN syntax, you’d rewrite the query like this:
SELECT Donors.DonorID,Donors.Donorname.
DonationRecords.DonationAmt
FROM Donors INNER JOIN DonationRecords
ON Donors.DonorID = DonationRecords.DonorID;


The phrase FROM Donors INNER JOIN DonationRecords ON tells SQL to find matching records between the two tables, based on the expression that follows the key word ON. That expression closely resembles the expression I used in the WHERE clause in the first example.

Since there are two ways to accomplish the same kind of query, which should you use? The answer depends on the version of SQL you’re using. In some cases, the INNER JOIN syntax results in more efficient processing.

The LEFT JOIN: Matching records plus orphans from the left
So what makes a LEFT JOIN different from an INNER JOIN? When you execute a query using the LEFT JOIN syntax, SQL does two things:
  • It returns all of the records from both tables that contain matching values, as defined by the ON clause.
  • It also returns all of the records from the table on the left side of the FROM…LEFT JOIN key words, even if there aren’t any matching values in the table on the right.

In this example, the LEFT JOIN answers the following business question: Is there anybody in our Donors list who hasn’t yet made a donation?

To illustrate how a LEFT JOIN works, let’s simply change INNER JOIN to LEFT JOIN in our original SELECT statement, like this:
SELECT Donors.DonorID,Donors.Donorname.
DonationRecords.DonationAmt
FROM Donors LEFT JOIN DonationRecords
ON Donors.DonorID = DonationRecords.DonorID;


When you execute this statement, you’ll get results like those shown in Figure C. Because the Donors table was mentioned first—therefore on the left of the FROM…ON construct—the LEFT JOIN returned records for Mary and Shawna, Donor IDs 4 and 5, respectively, even though there are no corresponding records in the DonationRecords table.

Figure C
When we executed the LEFT JOIN, the query returned all of the values from the Donors table (the one on the left), even if there wasn’t a corresponding record in the DonationRecords table.


The RIGHT JOIN: Matching records plus orphans from the right
When you execute a query using the RIGHT JOIN syntax, SQL does two things:
  • It returns all of the records from both tables that contain matching values, as defined by the ON clause.
  • It also returns all of the records from the table on the right side of the FROM…RIGHT JOIN key words, even if there aren’t any matching values in the table on the left.

Let’s illustrate how the RIGHT JOIN works by changing LEFT JOIN to RIGHT JOIN in our sample SELECT statement:
SELECT Donors.DonorID,Donors.Donorname.
DonationRecords.DonationAmt
FROM Donors RIGHT JOIN DonationRecords
ON Donors.DonorID = DonationRecords.DonorID;


This statement yields results like those shown in Figure D. Because the DonationRecords table was mentioned second—on the right of the FROM…ON construct—the RIGHT JOIN returned records for DonorID 6, even though there is no corresponding record in the Donors table for someone with that ID number.

Figure D
The RIGHT JOIN query returns all of the values from the DonationRecords table. The circled values are from the records for which there is no corresponding entry in the Donor table.


While there are other types of JOINs supported by various flavors of SQL, most implementations of SQL support the INNER JOIN, LEFT JOIN, and RIGHT JOIN constructs. Once you’ve figured out how these three queries work, they can be very useful tools for finding and fixing orphaned records (or other dirty data) in your tables.

Join the discussion
To comment on this article or to share your favorite SQL tips, please start a discussion below or write to Jeff.

 
0 comments

Editor's Picks