Discussion on:
View:
Show:
wow - the examples are great cos i've always wondered how to do this, but i found the text explaining the examples more confusing! some diagrams would be a great help, but thats being picky, very grateful for the article
Subqueries can result in very slow performance in some situations. Be careful to use indexed columns on data sets that could become large, otherwise you could quickly run out of memory/swap space on some DBMS's.
Hi,
This is one of the rare articles on sub queries. Many people are afraid of sub queries and avoid it all costs. Nicely explained. Would have been nicer if Pubs database was used then the reader could execute the query agains a standard data and follow the logic.
Thanks.
This is one of the rare articles on sub queries. Many people are afraid of sub queries and avoid it all costs. Nicely explained. Would have been nicer if Pubs database was used then the reader could execute the query agains a standard data and follow the logic.
Thanks.
I do agree this can help a developer get their results it is important to note that using table joins will be quicker than using a subquery...especially with using the NOT operator. The NOT operator essentially needs to scan the data in full beforereturning the recordset. Kind of like a table scan. The best approach when doing a scalable web app if finding out the table structure and biting the bullet for a table join. Or better yet...buy your DBA lunch and have them build a stored proc for you.
That's right. NOT IN is not recommended for this exact reason - performance is suffering greatly. NOT EXISTS will work better. But lunch for DBA... That was a good one. Except for not every non IT company can afford a DBA.
Personally, I only use "NOT IN" in situations that involve lists such as
select * from blurf where ID not in ('abc','def', 'ghi')
and "NOT EXISTS" where a table/view is involved such as
select * from blurf b where not exists (Select * from OtherTable o where o.BlurfID = b.BlurfID)
However, another thing I tend to do for my analysts and other non-powerusers is create denormalized views. This way they are not fighting with the concepts of normalized data.
select * from blurf where ID not in ('abc','def', 'ghi')
and "NOT EXISTS" where a table/view is involved such as
select * from blurf b where not exists (Select * from OtherTable o where o.BlurfID = b.BlurfID)
However, another thing I tend to do for my analysts and other non-powerusers is create denormalized views. This way they are not fighting with the concepts of normalized data.
In most cases, the (NOT) EXISTS subquery will be faster than the (NOT) IN subquery. In a large database, the difference can be significant. The IN query compares every record between the outer and inner queries, regardless of whether a match has been found; the EXISTS query is boolean and returns control to the outer query as soon as the condition is met.
How would I use SQL subselects to do the following 2 select statements?
SELECT ce.cell_id, cn.cell_name, ce.cell_size, ce.cell_type
FROM cells ce,
cell_names cn,
extent_type et,
extent_size es,
WHERE ce.cell_id = cn.cell_id and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) and
ce.cell_type = upper(et.extent_type_descr);
SELECT ce.cell_id, cn.cell_name, ce.cell_size, ce.cell_type
FROM cells ce,
cell_names cn,extent_type et,
extent_size es,
WHERE ce.cell_id = cn.cell_id and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) and
ce.cell_type is NULL;
Any info would be greatly appreciated.
SELECT ce.cell_id, cn.cell_name, ce.cell_size, ce.cell_type
FROM cells ce,
cell_names cn,
extent_type et,
extent_size es,
WHERE ce.cell_id = cn.cell_id and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) and
ce.cell_type = upper(et.extent_type_descr);
SELECT ce.cell_id, cn.cell_name, ce.cell_size, ce.cell_type
FROM cells ce,
cell_names cn,extent_type et,
extent_size es,
WHERE ce.cell_id = cn.cell_id and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) and
ce.cell_type is NULL;
Any info would be greatly appreciated.
SELECT ce.cell_id, cn.cell_name, ce.cell_size, ce.cell_type
FROM cells ce,
cell_names cn,
extent_type et,
extent_size es,
WHERE ( ce.cell_id = cn.cell_id and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) )
AND
( ce.cell_type = upper(et.extent_type_descr)
OR
ce.cell_type is NULL );
Note that in Oracle you may have to use double parens for controlling comparison order:
SELECT ce.cell_id, cn.cell_name, ce.cell_size, ce.cell_type
FROM cells ce,
cell_names cn,
extent_type et,
extent_size es,
WHERE (( ce.cell_id = cn.cell_id and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) ))
AND
(( ce.cell_type = upper(et.extent_type_descr)
OR
ce.cell_type is NULL ));
FROM cells ce,
cell_names cn,
extent_type et,
extent_size es,
WHERE ( ce.cell_id = cn.cell_id and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) )
AND
( ce.cell_type = upper(et.extent_type_descr)
OR
ce.cell_type is NULL );
Note that in Oracle you may have to use double parens for controlling comparison order:
SELECT ce.cell_id, cn.cell_name, ce.cell_size, ce.cell_type
FROM cells ce,
cell_names cn,
extent_type et,
extent_size es,
WHERE (( ce.cell_id = cn.cell_id and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) ))
AND
(( ce.cell_type = upper(et.extent_type_descr)
OR
ce.cell_type is NULL ));
When I tried the first set of code in your response, it returned the row with the NULL value numerous times (my data only has 1 row with a NULL value) while the rest of the non-null valued rows were returned correctly.
When I tried the second setof code in your response, it returned the row with the NULL value 5 times and the rest of the rows with non-null correctly.
What did seem to work though was this code:
SELECT ce.cell_id
FROM cell_names cn,
extent_type et,
extent_size es,
cell_extent_copy cec
WHERE ce.cell_id = cn.cell_id and
ce.cell_id = cec.cell_id_nmbr and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) and
ce.cell_type = upper(et.extent_type_descr)
UNION
SELECT ce.cell_id
FROM cells ce,
cell_names cn,
extent_type et,
extent_size es,
cell_extent_copy cec
WHERE ce.cell_id = cn.cell_id and
ce.cell_id = cec.cell_id_nmbr and
cn.cell_name_current = 'Y' and ce.cell_size = upper(es.extent_size_name) and
ce.cell_type is null;
And also the following code provided the correct results:
SELECT distinct (ce.cell_id)
FROM cells ce,
cell_names cn,
extent_type et,
extent_size es,
cell_extent_copy cec
WHERE (ce.cell_id = cn.cell_id and
ce.cell_id = cec.cell_id_nmbr and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) )
and
(ce.cell_type = upper(et.extent_type_descr)
or
ce.cell_type is null);
I'm not sure why the 'distinct' had to be put in.
Any idea?
As you've probably noticed, I posted this problem to you also in a SQL Joins article you wrote. Because someone else may be interested in this topic but thinking about it along those thoughts, I will copy this response there as well.
Thank you. MH
When I tried the second setof code in your response, it returned the row with the NULL value 5 times and the rest of the rows with non-null correctly.
What did seem to work though was this code:
SELECT ce.cell_id
FROM cell_names cn,
extent_type et,
extent_size es,
cell_extent_copy cec
WHERE ce.cell_id = cn.cell_id and
ce.cell_id = cec.cell_id_nmbr and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) and
ce.cell_type = upper(et.extent_type_descr)
UNION
SELECT ce.cell_id
FROM cells ce,
cell_names cn,
extent_type et,
extent_size es,
cell_extent_copy cec
WHERE ce.cell_id = cn.cell_id and
ce.cell_id = cec.cell_id_nmbr and
cn.cell_name_current = 'Y' and ce.cell_size = upper(es.extent_size_name) and
ce.cell_type is null;
And also the following code provided the correct results:
SELECT distinct (ce.cell_id)
FROM cells ce,
cell_names cn,
extent_type et,
extent_size es,
cell_extent_copy cec
WHERE (ce.cell_id = cn.cell_id and
ce.cell_id = cec.cell_id_nmbr and
cn.cell_name_current = 'Y' and
ce.cell_size = upper(es.extent_size_name) )
and
(ce.cell_type = upper(et.extent_type_descr)
or
ce.cell_type is null);
I'm not sure why the 'distinct' had to be put in.
Any idea?
As you've probably noticed, I posted this problem to you also in a SQL Joins article you wrote. Because someone else may be interested in this topic but thinking about it along those thoughts, I will copy this response there as well.
Thank you. MH
Performance can be an issue on using subselects but it can be an issue with joins too,
in fact many queries using subselect perform by far faster than their join equivalent.
You should use special care when using correlated queries, these are queries that use data from other queries, like the ones in the article under "Correlate results using EXISTS
", "Use aggregate functions for comparison
" and "Return subselect results
" , as this queries will be executed for each row in the main query, if you happen to have a million records your subselect will be executed a million times!
Anyway subselects are powerful and joins are not per se faster than subselects, you may need to do several convoluted queries using joins to do what you can acomplish with a single query using subselects, thats why the lack of subselect support is the main complaint for mysql.
Using queries based on indexed fields is the best think you can do to enhance performance.
The usage of NOT IN doesn't necesarily means a full table scan, if your query is index based the DBMS will issue a Seek operation with nearly instantaneous response. Another important consideration is that performance is highly dependent on the optimizer that the DBMS uses and that it can be highly influenced by the order of your conditions on the Where clause, in general if you use subselects in your condition make it the last one in your where clause and put the ones based on indexes first, I've seen queries gone from execution time of less than 2 seconds to over 15 minutes by just changing the order of the conditions on the where clause.
By last, consider that there could be more than one way to do the same query, and while results can be the same, differences inperformance can be huge.
in fact many queries using subselect perform by far faster than their join equivalent.
You should use special care when using correlated queries, these are queries that use data from other queries, like the ones in the article under "Correlate results using EXISTS
", "Use aggregate functions for comparison
" and "Return subselect results
" , as this queries will be executed for each row in the main query, if you happen to have a million records your subselect will be executed a million times!
Anyway subselects are powerful and joins are not per se faster than subselects, you may need to do several convoluted queries using joins to do what you can acomplish with a single query using subselects, thats why the lack of subselect support is the main complaint for mysql.
Using queries based on indexed fields is the best think you can do to enhance performance.
The usage of NOT IN doesn't necesarily means a full table scan, if your query is index based the DBMS will issue a Seek operation with nearly instantaneous response. Another important consideration is that performance is highly dependent on the optimizer that the DBMS uses and that it can be highly influenced by the order of your conditions on the Where clause, in general if you use subselects in your condition make it the last one in your where clause and put the ones based on indexes first, I've seen queries gone from execution time of less than 2 seconds to over 15 minutes by just changing the order of the conditions on the where clause.
By last, consider that there could be more than one way to do the same query, and while results can be the same, differences inperformance can be huge.
The query: "SELECT song_name FROM Album WHERE band_name = ?Metallica? AND song_name IN(SELECT song_name FROM Lyric WHERE song_lyric LIKE ?%justice%?)" works when you know you are searching for 'justice', but what if the LIKE portion of the query wasnot known, but was a column in Album? I have a field in Album named Universe, and I want to find all rows in Lyric where song_lyric is a substring of Universe. How would I set that up?
Remember that "Like '?metalica?" because you have wildcarded the FIRST character will not allow indexed to be used. I believe the same happens with a "" as well.
which one is faster IN/EXISTS or Subselect and why ?
- Keyboard Shortcuts:
- Prev
- Next
- Toggle

































