By Chris Seifert
Often, especially in smaller databases, simple SQL statements, such as SELECT and UPDATE, are all you need to get the required data. This situation changes as a database grows—especially in number of tables. You will increasingly find it necessary, with these databases, to extract information from multiple tables. You can do so with the join query. In this article, I will explain what a join is and how it differs from simpler queries.
This article uses an example database containing the film information shown in Tables A and B.
Simple SQL queries
Let’s begin with simple SQL queries and proceed to using joins. For example, if you want the title, year, and actor for the movie Forrest Gump, you can use the following two SELECT queries with the tables:
SELECT title, year FROM Movie WHERE MovieID='22'
SELECT Name FROM Actor WHERE MovieID='22'
Afterward, a bit of programming is necessary to bring the information together. But why waste all that time coding when you can use a join?
What is a join?
Joins are possibly both the most frequently used and the most confusing aspect of SQL, so let’s continue with the example to gain a better understanding. Rather than the multiple query method, you can use the following join to get the same result without additional programming:
SELECT title, year, actor FROM Movie, Actor WHERE
(Movie.MovieID=Actor.MovieID) AND MovieID='22'
Let’s take a closer look at this command. We need to examine three segments in this query. The first—SELECT title, year, actor—names what you are looking for. The second—FROM Movie, Actor—names the tables from which you are querying. Finally, the third section—WHERE (Movie.MovieID=Actor.MovieID) AND MovieID='22'—determines which records are chosen.
What is the difference?
You may wonder about the difference between the two approaches I’ve introduced. In the SELECT section, you select fields from both tables. In the FROM section, you list all the tables you are using. In the WHERE group, you choose the fields that share common data. This is the foundation of joins: The tables must have reference points that connect them to one another.
In the example, the common fields are the MovieID fields. The fields don’t have to be named the same, but it is often a good idea to consider doing so when designing your database. As long as the two fields have compatible data types (such as a floating point and an integer), you can use them for joining purposes. Bear in mind that these fields must have the same data, regardless of their data type. Joining the two tables where Movie.MovieID=Actor.ActorID won't do you any good, because this statement doesn’t compare two sets of data that describe the same things. Now, let’s take a closer look at joins.
Types of join operations
There are four kinds of joins: cross, inner, outer, and self joins. The join in our example is an inner join. Changing the syntax to make this more apparent, the join can also look like this:
SELECT title, year, actor FROM Movie INNER JOIN Actor ON
(Movie.MovieID=Actor.MovieID) AND MovieID='22'
The inner join returns only the records that match the specific criteria you ask for (Movie.MovieID=Actor.MovieID) and nothing else. This result is different from that of an outer join, which can return these records as well as unmatched rows from one or both of the tables you are pulling from.
A self join joins data from different fields within the same table. Self joins are rarely required.
A cross join is a specialized inner join. It does the same thing as the inner join, but it does not have a WHERE clause, making it the Cartesian product of the tables you are comparing. Thus, the cross join query could look like this:
SELECT * FROM Actor, Movie
or like this:
SELECT * FROM Actor CROSS JOIN Movie
The result of the cross join would be a virtual table like Table C.
Only three of the records (those highlighted in yellow) in Table C are accurate for our purposes, but a cross join could be useful in some situations where every possible combination is desired. The biggest issue concerning cross joins is the fact that they can easily slow a database; the amount of required processing increases quickly as the number of records increases. In this case, you have three records for each table, making nine results in the cross join. What if you had 10,000 records in each table? Or 10 million? The processing time could cripple the system.
I’ve shown you only the beginning of the real power of the join in SQL. By incorporating these tools into your SQL toolboxes, you can save considerable programming time and effort with advanced queries. In the next article, I’ll explore inner and outer joins in detail, examining the differences among left, right, and full inner or outer joins.