Saturday, January 20, 2018

Are SQL Server subqueries bad? Let's find out!

By Steve Endow

For the past several years, I've noticed that I have generally avoided using subqueries, based on a suspicion that they are probably less efficient than a JOIN.  I do still use them for one-time or ad-hoc queries where performance isn't a concern, but I have been avoiding them for any production queries that I release.  But I haven't done any research to support my suspicion.

This Saturday morning, while working on another SQL query optimization issue, I figured I would try a test to compare the performance of a simple subquery vs. a JOIN.

What do you think?  Do you think that subqueries are typically slower than JOINs?  If so, how much slower?

Here's a YouTube video where I test my sample queries and review the results.



Before doing my test, I searched the interwebs and I found a post (unfortunately the images aren't showing for me) that appears to definitively demonstrate the correlated subqueries perform much worse than similar JOIN queries.  The explanation made sense.

To qualify my test setup: These queries were run on SQL Server 2014 SP1 using Management Studio 17.4 against the GP 2016 Fabrikam/TWO database. And also notable is that I have 73,752 records in my SOP30300 table--which is probably quite a bit more than a vanilla TWO database.  I suspect this is important, as the results may be different for other SQL Server versions, and may vary based on the number of tables and records.




Here is my very simple sample subquery, which I believe is technically called a "correllated subquery", because it uses a value from the main query to filter the results of the subquery. (or the inner query uses a value from the outer query)

The second query produces the same result by using a JOIN.


SET STATISTICS IO, TIME ON

SELECT i.ITEMNMBR AS ItemNumber, (SELECT COUNT(*) FROM dbo.SOP30300 WHERE ITEMNMBR = i.ITEMNMBR) AS LineCount FROM IV00101 i

SELECT i.ITEMNMBR, COUNT(sop.ITEMNMBR) AS LineCount
FROM IV00101 i
LEFT OUTER JOIN SOP30300 sop ON sop.ITEMNMBR = i.ITEMNMBR
GROUP BY i.ITEMNMBR

SET STATISTICS IO, TIME OFF


Pretty simple--it lists all item numbers, and for each item, counts how many SOP lines exist for that item.

My assumption was that the subquery would be noticeably slower than the JOIN.

To my surprise, the two queries had effectively the same cost and performance.


Interestingly, there are more reads for the subquery version, but the performance is effectively the same--in fact the subquery was slightly faster in this test.  This is not what I expected.

But there's more!

Take a look at the actual execution plan.  Take a good, close look.  What do you see?



I am sure that the first few times I ran the two queries, there WAS a difference in the execution plan, but it still showed a 50%/50% for the two queries--effectively the same cost. But after several runs, I now consistently see the exact same query plan for both queries.  The EXACT same query plan.

WHAT?  This is absolutely not what I expected.

And after running the query a few more times, the statistics are now always identical.  I'm afraid the machines are learning.



So...what does this mean?  Well, I wouldn't call this a universal rule, but, for a simple query, on SQL Server 2014 SP1, with a relatively small result set, it appears that SQL Server is able to figure out that the two different queries are effectively the same, and after running the queries a few times, SQL uses the exact same execution plan for both queries.

Does this mean that subqueries are exactly the same as JOINs?  I would assume the answer is no. If you have a more complex query, a slightly different subquery, more data, more JOINs, or a different set of indexes on the tables, it could be that a subquery produces a wildly different execution plan that is much slower than an equivalent JOIN.  And if you run the same query on a different version of SQL Server, the query optimizer may behave completely differently.

But what this did show me is that correllated subqueries are not ALWAYS more costly than a JOIN. So I don't necessarily have to avoid them if they are easier to use or make query design or prototyping faster or easier.

And now I know!


Steve Endow is a Microsoft MVP in Los Angeles.  He is the owner of Precipio Services, which provides Dynamics GP integrations, customizations, and automation solutions.

You can also find him on Twitter, YouTube, and Google+



No comments: