It’s sometimes difficult to know which SQL syntax to use when combining data that spans multiple tables. SQL provides several different statements for performing this type of task; knowing which to apply will yield efficient and correct results. In this article, I’ll discuss some of the more frequently used methods for consolidating data from multiple tables into a single result set: SELECT, JOIN, UNION and subqueries.
SEE: Take advantage of these quick SQL lessions with this certification bundle from TechRepublic Academy
The most recent stand for SQL syntax is SQL:2016 ISO, but little has changed for these basic querying statements in decades. However, it’s important to note that some databases have unique enhancements that can yield unexpected results if not applied correctly. Please refer to your manufacturer’s documentation to ensure you’re using the proper dialect for your system.
A simple SELECT statement is the most basic way to query multiple tables. You can call more than one table in the statement’s FROM clause to combine results from multiple tables. Here’s an example of how this works:
SELECT table1.column1, table2.column2 FROM table1, table2 WHERE table1.column1 = table2.column1;
Dot notation uses the period character to separate the table and column. If the column appears only in one of the referenced tables, you don’t need to include the fully qualified name, but it may be useful to do so for readability.
The comma character separates the tables in the FROM clause. You can include as many tables as needed, although some databases have a limit to what they can efficiently handle before introducing a formal JOIN statement.
This basic syntax is a simple INNER JOIN. Some databases treat it exactly the same as an explicit JOIN.
The WHERE clause expresses the criteria — returning only those records where the value in table1.column1 equals the value in table2.column1. The result is a temporary table that contains values from two tables, where the record satisfies the WHERE expression.
It’s worth noting that your conditions for comparison don’t have to be the result set. In the example above, table1.column1 and table2.column1 are used to combine or join the tables, but only values from table2.column2 are returned.
Include more than two tables with AND
You can extend this functionality to more than two tables using AND keywords in the WHERE clause. You can also use such a combination of tables to restrict your results without actually returning columns from every table. For example:
SELECT table1.column1, table2.column2 FROM table1, table2, table3 WHERE table1.column1 = table2.column1 AND table1.column1 = table3.column1;
This uses AND to include two conditions in the WHERE clause.
The data comes from table1.column1 and table2.column2 but only when both conditions in the WHERE clause are met. Table3.column1 restricts the data returned even though no data from this table makes it to the result set. Note that table3 must be referenced in the FROM clause.
Be warned, however, that this method of querying multiple tables is effectively an implied JOIN. Your database may handle things differently, depending on the optimization engine it uses. Also, neglecting to define the nature of the correlation with a WHERE clause can give you undesirable results, such as returning the rogue field in a column associated with every possible result from the rest of the query, as in a CROSS JOIN.
If you’re comfortable with how your database handles this type of statement, and you’re combining two or just a few tables, a simple SELECT statement will meet your needs.
JOIN works in the same way as the SELECT statement above: It returns a result set with columns from different tables. The advantage of using an explicit JOIN over an implied one is that it offers greater control over your result set and possibly improved performance when many tables are involved.
There are several types of JOIN — LEFT, RIGHT, and FULL OUTER; INNER; and CROSS. The type you use is determined by the results you want to see. For example, using a LEFT OUTER JOIN will return all relevant rows from the first table listed, while potentially dropping rows from the second table listed if they don’t have information that correlates in the first table.
This differs from an INNER JOIN or an implied JOIN. An INNER JOIN will only return rows for which there is data in both tables. Compare the following query after using JOIN instead of WHERE:
SELECT table1.column1, table2.column2 FROM table1 INNER JOIN table2
ON table1.column1 = table2.column1;
The correlating column is still the same, table2.column1, even though the query doesn’t retrieve data from this column.
Subqueries, or subselect statements, are a way to use a result set as a resource in another query to limit or refine results. The alternative is often multiple queries or unwanted manipulation of the raw data.
With a subquery, you can reference tables to determine inclusion of data or, in some cases, return a column that is the result of a subselect. The following example uses two tables:
SELECT column1 FROM table1 WHERE EXISTS ( SELECT column1 FROM table2 WHERE table1.column1 = table2.column1 );
One table, table1, contains the data being returned, while the other, table2, provides a comparison value to determine the data actually returned from table1.
One important consideration when using subqueries is performance. Convenience comes at a price, and depending on the size, number, and complexity of tables and statements you use, you may want to allow your application to handle processing.
Each query is processed separately in full before being used as a resource for your primary query. If possible, creative use of JOIN may provide the same information with less lag time.
The UNION statement is another way to return information from multiple tables using a single query. The UNION statement allows you to perform queries against several tables and return the results in a consolidated set, as in the following example:
SELECT column1, column2, column3 FROM table1 UNION SELECT column1, column2, column3 FROM table2;
This statement consolidates data from table1 and table2 and returns a result set with three columns containing data from both queries. There’s no JOIN or condition to meet.
By default, the UNION statement will omit duplicates between the tables unless the UNION ALL keyword is used.
If your column names don’t match when you use the UNION statement, use aliases to give your results meaningful headers:
SELECT column1, column2 as Two, column3 as Three FROM table1 UNION SELECT column1, column4 as Two, column5 as Three FROM table2;
The result set will have three columns, column1, Two and Three — where Two contains values from table1.column2 and table2.column4 and Three contains values from table1.column3 and table2.column5.
As with subqueries, UNION statements can create a heavy load on your database server, but for occasional use, they can save a lot of time.
When to use SELECT, JOIN, UNION or subqueries
When it comes to database queries, there are usually many ways to approach the same problem, but one will usually be more efficient than the others. We’ve reviewed some of the more frequently used methods for consolidating data across multiple tables. While some of these options may affect performance, practice will help you know when it’s appropriate to use each type of query.
SEE: For more tutorials like this, check out the Developer topic at TechRepublic.
Subscribe to the Developer Insider Newsletter
From the hottest programming languages to commentary on the Linux OS, get the developer and open source news and tips you need to know. Delivered Tuesdays and Thursdays