Do you ever find yourself juggling queries to get the information you need? Subselects, also called subqueries, may be just what you’ve been looking for. This SQL feature allows you to query within a result set, creatively limit your result set, or correlate your results with an otherwise unrelated query in a single call to the database. In this article I’ll provide several subselect examples and discuss when to use them.


Need a review of SQL basics?

These articles should help you get on track:
SQL Basics I: Data Queries—covers database terminology and the four basic query types.
SQL Basics: SELECT statement options—covers the SELECT statement in detail and explains aggregate functions.


Search within a result set
The concept of a subselect is simple: One select query is nested inside another query, creating a resource otherwise unavailable for searching in a single statement. The subselect allows query consolidation. The burden of result set comparisons is placed on the database rather than application code.

One way to use this functionality is by locating a value related by column with comparable data in two tables. For example, I have a database with two tables, Album and Lyric. I can easily find the name of every song by Metallica that contains the lyric “justice” with the following subquery statement:
SELECT song_name FROM Album
WHERE band_name = ‘Metallica’
AND song_name IN
(SELECT song_name FROM Lyric
WHERE song_lyric LIKE ‘%justice%’);

This example is simple. I selected all Metallica songs from the Album table. Next, I selected all songs containing the lyric “justice” from the Lyric table. Finally, I used the IN keyword to return the song names from the Album table appearing in the Lyric table result set.

I used the Lyric table result set to limit our results from the Album table. The subselect portion of the WHERE clause is completely self-contained, so I don’t have to use fully qualified column names such as Album.song_name and Lyric.song_name. I’m not returning any values from the Lyric table in the final result set. If I needed the song lyrics themselves, I’d use a JOIN statement.

Exclude results using NOT IN
You can use the NOT IN keyword to obtain results explicitly not contained in another result set. For example, I might want to return all Metallica songs in the “And Justice for All” album that do not contain the word “justice” by way of the following code:
SELECT song_name FROM Album
WHERE album_name = ‘And Justice for All’
AND band_name = ‘Metallica’
AND song_name NOT IN
(SELECT song_name FROM Lyric
WHERE song_lyric LIKE ‘%justice%’);

In the previous SQL code, I selected all songs from Metallica’s album “And Justice for All,” followed by all songs with the lyric “justice,” and finally returned all songs from the Album result set that do not appear in the Lyric result set. Rather than returning two queries and using code to compare arrays, you can get the exact results with a single statement.

Correlate results using EXISTS
Sometimes you can access the same data through more than one avenue, and you need to match up (or correlate) your results to get a cross-section of values. For example, I can get a list of Metallica songs by searching the Album table. However, I can get a list of Metallica songs performed by Damage, Inc. from my Cover table. I can correlate the values by directly comparing query results from both tables.
SELECT Album.song_name FROM Album
WHERE Album.band_name = ‘Metallica’
AND EXISTS
(SELECT Cover.song_name FROM Cover

WHERE Cover.band_name = ‘Damage, Inc.’
AND Cover.song_name = Album.song_name);

In the SQL code, I used fully qualified column names because I’m directly comparing the two tables, not merely using the result set as a passive resource. I’m not returning results from the Cover table. Some databases support the NOT EXISTS keyword, ensuring that you do not have a match.

Use aggregate functions for comparison
In addition to using subselects to examine data in related tables, it’s also possible to use an aggregate function in a WHERE clause subselect to make a determination about the primary result set. For example, I want to verify Album table entries for every Metallica song. Also, I want to return the album names that have missing tracks. Conveniently, the AlbumInfo table contains a column (album_tracks) signaling how many tracks there should be.
SELECT AlbumInfo.album_name FROM AlbumInfo
WHERE AlbumInfo.band_name = ‘Metallica’
AND album_tracks <>
(SELECT COUNT(*) FROM Album
WHERE Album.album_name = AlbumInfo.album_name);

I’ve successfully returned the names of all Metallica albums in which the expected number of tracks does not equal the number of song entries in the Album table.

Return subselect results
What if I’m still worried about the number of songs per album, and I need to generate a comparison report? It’s possible to return the results of a subselect as part of my final result set. This feature is most frequently used for aggregate functions. Usually, you can just access other tables directly as part of your query. The next example will return every Metallica album, the number of tracks it should contain, and how many entries are included in the Album table:
SELECT AlbumInfo.album_name, album_tracks,
(SELECT COUNT(*) FROM Album
WHERE Album.album_name = AlbumInfo.album_name)
FROM  AlbumInfo
WHERE AlbumInfo.band_name = ‘Metallica’;

Another powerful example of this last concept involves changing the album_tracks value in the AlbumInfo table to the actual number of entries in the Album table:
UPDATE AlbumInfo SET album_tracks =
SELECT COUNT(*) FROM Album
WHERE AlbumInfo.album_name = Album.album_name)
WHERE AlbumInfo.band_name = ‘Metallica’;

The subselect statements in the two previous examples are executed as a self-contained unit.

Subselect comparison keywords (ALL, SOME, ANY)
In addition to using standard query functionality, there are three keywords that let you compare an expression value to a one-column subselect statement result set. These keywords return a TRUE or FALSE Boolean value. The ALL keyword requires that all values from the subselect comply with the comparison operator. The SOME and ANY keywords require that at least one pair complies. Here’s a simple example of the ALL keyword.
SELECT * FROM AlbumSales
WHERE album_gross >
ALL (SELECT album_costs FROM AlbumProduction);

The above example will return all records from the AlbumSales table that grossed more than the cost to produce the most expensive album in the AlbumProduction table. If ANY was used instead of ALL, the statement would return records for all albums that grossed more than the least costly album production. The statement = ANY (equal any) is equivalent to the keyword IN. The statement <> ALL (not equal all) is equivalent to using the NOT IN keyword. The keywords ANY and SOME are equivalent. Support for these keywords varies between database manufacturers, so be sure to check your documentation if your results are not what you expected.

Who’s afraid of normalized data structures?
The subselect query syntax is simple and straightforward—knowing when to use it is the difficult part. If you’ve ever balked at normalized data structures, the subselect statement will help you realize their potential.

 

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays