Data Management

Working with nested subqueries in SQL Server

Arthur Fuller provides an example that demonstrates the power of nested subqueries. He notes that nested subqueries can help you find the answers you need when the pieces are scattered across a number of tables.

Many SQL Server programmers are confused about the use of subqueries, especially nested subqueries (i.e., a subquery that contains a subquery). Let's start with the basics.

There are two kinds of subqueries: standard and correlated. The standard subquery executes once, and its results are fed into the parent query. A correlated subquery executes once for every row retrieved by the parent query. In this tip, I shed some light on the power of nested subqueries (I'll save correlated subqueries for another day).

Consider this problem: You want to generate a list of the salespeople who have sold flat washers. The data you need is scattered among four tables: Person.Contact, HumanResources.Employee, Sales.SalesOrderHeader, and Sales.SalesOrderDetail. In SQL Server, you write it outside-in, but it's often helpful to think about it inside-out first, i.e., working out as far as required one statement at a time.

Working from the inside out, you examine the Sales.SalesOrderDetail table, matching the ProductNumber value with a LIKE statement. You connect these rows to the Sales.SalesOrderHeader table from which you obtain the SalesPersonIDs. Then you connect to the HumanResources.Employee table using the SalesPersonID. Finally, you use the ContactID to connect to the Person.Contact table.

USE AdventureWorks ;
GO
SELECT DISTINCT c.LastName, c.FirstName
FROM Person.Contact c JOIN HumanResources.Employee e
ON e.ContactID = c.ContactID WHERE EmployeeID IN
(SELECT SalesPersonID
FROM Sales.SalesOrderHeader
WHERE SalesOrderID IN
(SELECT SalesOrderID
FROM Sales.SalesOrderDetail
WHERE ProductID IN
(SELECT ProductID
FROM Production.Product p
WHERE ProductNumber LIKE'FW%')));
GO

This example illustrates several cool things about SQL Server. You see that it is possible to substitute a SELECT statement for the IN() parameter. In this example, you do it twice, thus creating a nested subquery.

I am a big fan of normalization, although I don't take it to absurd lengths. Normalization carries with it the increased complexity of various queries. Subqueries can prove very useful in these circumstances, and nested subqueries even more so.

When the answers you need are scattered across lots of tables and you must somehow bring the pieces together again, you may find that a nested subquery is the way to go.

Miss a tip?

Check out the SQL Server archive, and catch up on the most recent editions of Arthur Fuller's column.

TechRepublic's free SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system. Automatically subscribe today!

13 comments
sdfsdf
sdfsdf

boo hiss rubbish move that bit in the IN clause to a JOIN for heavens sake

pmetcalf
pmetcalf

pun intended. I agree with the article author. Of course it depends on the data, number of records, table indexes. In many cases I've reduced query processing time significantly with IN. If you're on a production box, you will have enough meat and magma inside so SQL Svr will clean up after you're done. If you have to worry about temp tables cleanup, if they are indeed created, perhaps you need a little more muscle. Also it is nice to have that flexibility if you need it.

Jeff_D_Programmer
Jeff_D_Programmer

Firstly, the example given here is an extremely poor one because Using IN creates a temp table for every item in the IN list. If each of these subqueries only returns 10 records, this query will create 1000 temp tables. Very inefficient. SQL is not designed to handle IN lists very well, but it IS designed to handle joins. That's why relational databases were created! Secondly, whenever I see a subquery, all it tells me is that the person doing the coding doesn't know what they're doing. I have NEVER seen a subquery that couldn't be written cleaner and more efficiently with the proper use of joins.

gordon.feeney
gordon.feeney

I'd have to say that I am in agreement here. The answer to most T-SQL and SQL Server questions that I come across or pose would tend to be 'it depends'. I've certainly used the nested-query approach but I'm also aware of the overheads involved in parsing very large datasets that can be returned by sub-queries. So, if I think that a problem warrants a more relational solution, then I'll use JOINS. Common sense I guess in many instances. In closing I'd have to say to Jeff_D_Programmer that explicitly criticizing someone's programming skills in a forum is not considered to be good netiquette. By all means offer constructive advice but saying that 'all it tells me is that the person doing the coding doesn't know what they're doing' is not very constructive. YaHozna.

nicruzer
nicruzer

Could you please post an example of how you might change the given example to improve it using joins? Do you have benchmark tests to prove your point(s)? If so, please post those results as well. Thanks.

chapman.tim
chapman.tim

I've never heard of the temp table being created for every record returned in a subquery...so Id love to see some documentation that says anything like it. It does create a virtual table to use though. Also, SQL IS designed to handle subqueries just fine...in fact, sometimes it is more effecient to use a subquery than use a join...it is rare, but I've seen it happen. The main hinderance is that when you perform a subquery using the IN operator, a distinct list of values is returned, and DISTINCT does incur costs. Subqueries can be usually written as JOINs, but it is a very bold, and incorrect, statement to say that someone who uses a subquery doesn't know what they are doing.

Zeroesque
Zeroesque

Jeff_D_Programmer does mention that he knows SQL Server 03 in his member profile, so consider the source.

Jeff_D_Programmer
Jeff_D_Programmer

SELECT DISTINCT c.LastName, c.FirstName FROM ( ( ( Person.Contact c JOIN HumanResources.Employee e ON e.ContactID = c.ContactID ) JOIN Sales.SalesOrderHeader s ON s.SalesPersonID = e.EmployeeID ) JOIN Sales.SalesOrderDetail d ON d.SalesOrderID = s.SalesOrderID ) JOIN Production.Product p ON p.ProductID = d.ProductID WHERE p.ProductNumber LIKE 'FW%' Both I/O Cost and CPU Cost are lower for this version by about 50%

Jeff_D_Programmer
Jeff_D_Programmer

I didn't say it created a temp table for every record returned by a sub query, I said it creates a tempt table for every element in an IN clause - regardless of where the elements come from. In other words, it's not the sub-query that creates the temp tables, it's the parent query as it's parsing through the IN elements.

btmanfai
btmanfai

I tested the query several times in my PC using the default Adventurework exmaple DB. The result is the author's query run faster and less I/O and CPU then yours. I think the IN subquery is not necessary bad.

chapman.tim
chapman.tim

...maybe you were confused as to what he was talking about. When you do use an IN() statement with literals, (or subqueries for that matter), the values returned are used by the optimizer as one large OR statement...you can look at the argument text from the execution plan. I've not found any evidence where a ton of temp tables are created for a simple IN statement, so I really doubt that it occurs.

Jeff_D_Programmer
Jeff_D_Programmer

Worked hand-in-hand with a Microsoft SQL 7/2000 Implementation team a few years ago while writing County and State property tax collection software. That's word for word what the lead guy said. At the time we were using IN with a list-string, but (according to him), where the IN elements come from doesn't matter. Better to create a temp table and then do a join than use an IN clause with more than a few entries. I would assume that hasn't changed much with the upgrade to 2005 since most of the upgrade involves object granularity and XML implementation, not core engine functionality.

Editor's Picks