Data Management

Master joins with these concepts

Data normalization is a fundamental aspect of database design, but you must also be able to collect data from the various tables. Learn how to use database joins to facilitate data referencing.


To efficiently handle queries and good database design, you must have a good understanding of SQL join statements. Beyond the logistics of basic syntax, there are several concepts that will round out your knowledge of joins and facilitate data referencing, such as aliasing, self joins, and ON clauses.

Aliasing isn't just for joins
Aliasing is an SQL feature that allows you to create a shorthand notation for a table or column within a query. It can also come in handy when dealing with the self join, which I’ll explain below.

An alias has the general format table_name.column_name AS alias, which allows you to refer to the aliased item by the handle you have assigned it. In very complicated JOIN statements, aliases make things easier to read and shorter to type.

This example shows a typical alias usage:
SELECT Co.company_name AS coname, Ind.industry_name AS indname
FROM Company AS Co LEFT JOIN Industry AS Ind
ON Co.industry_id = Ind.industry_id
WHERE coname LIKE ‘%Tech%’ AND indname = ‘Computing’;


Note that the AS keyword is optional; however, I recommend using it for clarity. Also, columns can be aliased using the format alias = table_name.column_name, but tables cannot be aliased this way.

The self join uses only one table
The self join is a type of JOIN statement used on a single, normalized table. Essentially, you can use a self join to derive hierarchical relationships of rows within a table. You must use aliases to describe the table as two virtual instances of itself, and then you can join them together.

An example of a common use of a self join might be in an employee table that lists all employees and their managers. A self join would allow you to make the association with a single query, where you might otherwise have to deal with a loop in your code.

Another example of a self join use might include tracking a development project in which you’ve laid out your tasks in a table. Among other information, your table includes a TaskID column with a unique number for each row, and a DependenceID column that includes the task ID of the task that must be completed before the current task can begin.

To get a list of tasks and their dependencies, you can use a self join like this:
SELECT
Current.task_descr AS Task,
Previous.task_descr AS Dependence
FROM Project AS Current
LEFT JOIN Project AS Previous
ON Current.DependenceID = Previous.TaskID;


This statement will result in two columns, titled Task and Dependence with descriptions of the relevant information in each row. Using the self join, you can use the data from a single table to create a self-referencing representation of the hierarchy built into the data itself.

The USING and ON clauses can join on multiple columns
Besides the ON clause, you can also use the USING clause to perform an equi-join, which equates a column’s value from one table to a column’s value in the joined table. You can also use either of these statements to join multiple columns, which might come in handy when you’re nesting JOIN statements.

In the ON clause, you may specify column names, which means you can make comparisons of data from two tables whose column names don’t match. To perform a search on multiple columns, separate criteria with an AND keyword. (See an example of this usage in the next section on JOIN order.)

In the USING clause, the column names must be the same and must exist in both tables being joined. If you’re using multiple columns for the join, separate column names with a comma. The following is a basic example of the syntax for the USING clause:
SELECT * FROM User INNER JOIN Event USING (userid);

Controlling JOIN order
Parentheses aren’t required in JOIN statements, but you can use them to change the natural order of the JOIN statement when more than two tables are involved. Normally, JOIN statements are either processed from left to right (in the order they appear in the query) or are determined by the database’s query optimizer prior to execution.

Some databases, such as SQL Server, Oracle, and DB2 also support the concept of JOIN HINT keywords. These keywords are inserted into a JOIN statement to control the order in which the JOINs are processed. For example, a MERGE JOIN will join two tables together before any other joins are processed, and a HASH JOIN will join a table into the results of a JOIN statement that has already been executed. JOIN HINTs can result in extremely poor database performance, however, and should be left to the DBAs.

Your alternative is to use parentheses to control JOIN order of execution. Here’s an example of how you would use the parentheses, where Catalog, Product, and Color are tables:
SELECT Catalog.item, Catalog.item_color, Product.item, Color.color_name
FROM Catalog
FULL OUTER JOIN ( Product CROSS JOIN Color )
ON Catalog.item = Product.item
AND Catalog.item_color = Color.color_name;


Become a database pro
The JOIN statement in particular is useful for navigating to the information you need with the least amount of juggling. By using the concepts discussed above, you’ll be well on your way to mastering the JOIN statement and database usage, in general. Aliasing, self joins, and nested join statements will help you master normalized databases like a pro.

 

Editor's Picks