Data Management

Build Your Skills: The underpinnings of SQL joins

If a query retrieves the wrong data, the SQL join is an excellent place to begin troubleshooting. Thus, understanding SQL joins in database relationships is critical when problems arise. This article should help provide a good foundation.


The cornerstone of the relational database system is the relationship—the association between tables that allows you to combine data from different but related sources. Most of the time, these associations are intuitive and work just as you expect. However, problems can arise when the wrong join is applied to a relationship and, consequently, the wrong data is retrieved. When this happens, and you're sure the relationship itself is sound, the join is the best place to begin troubleshooting. In this article, I'll introduce the technical specifications that make up joins before moving on to another article that describes the many different types of joins supported by SQL.

What you need to know
This article assumes that you have a basic understanding of database relationships. Most systems support their own unique version of SQL, which will share most of the SQL standards, but not all. This article focuses on three of the most popular relational database management systems: Microsoft SQL Server, Microsoft Access, and Oracle. We'll note differences throughout the article, as they apply to the examples.

SQL standards
Technically, a join is a rule that defines a relationship by determining which records are retrieved or acted upon. That means a single relationship can produce different results, based on the type of join applied to the relationship at the time you run the query. What all that means is that a join is conditional—similar to a WHERE clause or criteria expression—in that the join specifies which records (rows) are selected in both tables. Certainly, the type of join significantly impacts which records are retrieved or acted upon.

Note
Before we get started, we should clarify that this article is about SQL joins, not about the way in which particular systems support SQL joins. Although we will discuss actual examples, for syntax purposes, we won't delve into the many nuances of each version-specific SQL. Examples will include the most basic syntax possible for simplicity's sake. For instance, we won't show alias syntax because they're all different—those details are better left to articles that deal with version-specific SQL information.

For the most part, joins can be specified in a FROM or WHERE clause, so be careful where you specify them. Because the data engine executes the clauses in a specific sequence, placement can affect the results. Use the following guidelines to determine the best place for a particular join:
  • Generally, the data engine fulfills the FROM clause before applying any other clauses. Adding the join here will eliminate records before they're evaluated by any other clauses. As a rule, you'll find that this is the recommended method.
  • Any records that make it past the FROM clause will then be evaluated by an implicit join condition specified in the WHERE clause.

To specify a join in the FROM clause, use the form:
 
SELECT fldlist | *
FROM table1jointable2
  [ON (condition)]
 

where table1 and table2 represent the related tables, join is the type of join applied to the relationship, and condition specifies the columns you're comparing to restrict the query's results. Usually, that condition is a simple expression that compares the primary and foreign key values in the form:
 
ON table1.primarykey = table2.foreignkey
 

The columns (fields) specified in fldlist can include all of the columns in, or just a subset from, the data source. A join can be built on a column not referenced in fldlist. Ambiguous columns must be fully qualified. Also note that this simple syntax is based on just two related tables although SQL can handle multiple joins in the same statement. Although the Equal To (=) operator is the most common, you can use any of the following SQL comparison operators: =, >, <, >=, <=, <>.

At this point, I want to stress that primarykey and foreignkey can refer to any column, as long as the data types match; you can create a temporary relationship between two non-key fields, although, for the most part, you probably won't. Throughout this article, we'll use the primarykey/foreignkey terms because they help emphasize the action that's going on behind the scenes. It's much more difficult to visualize the result of joining two non-key columns. However, technically, you aren't restricted to the primary/foreignkey fields, and you may need the capability some day. Just remember that a relationship based on non-key columns is temporary and exists only between the two tables involved.

Let's look at a quick example using Microsoft SQL Server's sample database, Northwind. The following statement lists each product by name and category:
 
SELECT Products.ProductName, Categories.CategoryName
FROM Products JOIN Categories
  ON Products.CategoryID = Categories.CategoryID
 

If you run the statement in Microsoft Access, use INNER JOIN instead of JOIN, as follows:
 
SELECT Products.ProductName, Categories.CategoryName
FROM Products INNER JOIN Categories
  ON Products.CategoryID – Categories.CategoryID
 

Note
Jet SQL doesn't support the lone JOIN keyword. Transact-SQL and Oracle SQL both interpret the lone JOIN keyword as an inner join.

Occasionally, you may want to express a join condition in the WHERE clause (although you probably won't have this need too often). Remember, when you do so, the join is implicit, but the condition is expressed in the same manner using the same rules for the FROM syntax. The one catch is that you can use this syntax only with an inner join, which you'll learn about in a future article.

Note
There are exceptions to this rule in some systems—Oracle, for one, supports an implicit join in the WHERE clause in an outer join—so consult your system's documentation before depending on this statement.

To express a join in the WHERE clause, use the form:
 
SELECT fldlist | *
FROM table1, table2
WHERE condition
 

where condition is a simple comparison of two columns. For instance, the previous FROM example could just as easily be written as:
 
SELECT Products.ProductName, Categories.CategoryName
FROM Products, Categories
WHERE Products.CategoryID = Categories.CategoryID
 

Both statements yield the same recordset, but that won't always be the case. Oracle SQL takes things one step further with the release of Oracle 9i by supporting additional JOIN syntax. Specifically, Oracle SQL (9i only) supports the USING clause in an implicit join. You can modify the previous syntax by replacing the WHERE clause with a USING clause as follows:
 
Select fldlist
FROM table1, table2
USING(primarykey)
 

Oracle SQL supports this syntax in a full outer join. (You'll learn about the different types of joins in a future article.)

Joining three or more tables
You can join multiple tables, but the complete statement can be fairly complex accommodating several join clauses. The rules are the same, but the most important thing to remember is that each join applies to only one relationship, and each relationship represents an association between only two tables. Therefore, joins must build upon one another. When joining more than two tables, use the following syntax:
 
FROM table1 join table2
  ON table1.primarykey = table2.foreignkey join table3
  ON table2.primarykey = table3.foreignkey
 

The first join applies to table1 and table2; the second join applies to table2 and table3. You might hear table2 referred to as the intermediate or translation table because it's the connection table for the other two tables. In a really complex statement, you might have several intermediate tables. Continue to add joins in this manner until your statement includes all the relationships and tables necessary to get just the right results. For more information on SQL standards, visit the Database Research Group's Web page for a helpful diagram.

About Susan Harkins

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

Editor's Picks