Data Management

Optimize SQL Server queries with these advanced tuning techniques

The best way to tune performance is to try to write your queries in a number of different ways and compare their reads and execution plans. Here are various techniques that you can use to try to optimize your database queries.

Now that you know how to speed up your SQL Server database queries, you can start delving into some of the more advanced tuning options.

The best way to tune performance is to try to write your queries in a number of different ways and compare their reads and execution plans. Here are various techniques that you can use to try to optimize your database queries.

Use JOINs rather than subqueries

If possible (and if it makes sense), I suggest using JOIN statements rather than subqueries to improve performance. When a subquery is used as criteria in a SELECT statement, the values returned from the subquery are distinct. Returning a distinct list of values requires additional processing, which can slow down your queries.

Use explicit transactions

When data manipulation occurs in the database, the actions are written to the transaction log. If your statements are executing many DML statements, it might be a good idea to place them inside of a transaction for performance purposes. Placing the statements inside of a transaction will prevent all of the statements from being written to the transaction log serially. For example, the following statement takes approximately seven seconds to execute on my laptop:

CREATE InsertTable

(

      IDCol INT IDENTITY(1,1),

      ColVal INT

)

GO

DECLARE @Counter INT

SET @Counter = 1

WHILE @Counter < 15000

BEGIN

      INSERT INTO InsertTable(ColVal)

      SELECT DATEPART(ms, GETDATE())

      SET @Counter = @Counter + 1

END

If I wrap the INSERT statements inside of a transaction, it takes a little less than two seconds to execute. This is because the statements are inside of a transaction rather than committed to a transaction log until the transaction commits. This reduces the number of writes to the log.

DECLARE @Counter INT

SET @Counter = 1

BEGIN TRAN

WHILE @Counter < 15000

BEGIN

      INSERT INTO InsertTable(ColVal)

      SELECT DATEPART(ms, GETDATE())

      SET @Counter = @Counter + 1

END

COMMIT TRAN
Note: I advise you to use this approach with care. If there are too many statements inside a transaction, it will increase the duration of the transaction, which increases the amount of time locks are held on the objects inside of the transaction.

Use UNION ALL instead of UNION

When you use the UNION clause to concatenate the results from two or more SELECT statements, duplicate records are removed. This duplicate removal requires additional computing to accomplish. If you are not concerned that your results may include duplicate records, use the UNION ALL clause, which concatenates the full results from the SELECT statements.

Use EXISTS when possible

When you need to check for the presence of certain conditions, it is usually faster to use the EXISTS function over COUNT(*). This is because COUNT(*) has to scan all records returned by the statement, while EXISTS will return a true value as soon as it finds a record that meets the criteria.

STATISTICS IO

There are different ways to determine the best way to write your queries. Two of my favorite methods are looking at the number of logical reads produced by the query and looking at graphical execution plans provided by SQL Server Management Studio. For determining the number of logical reads, you can turn the STATISTICS IO option ON. Consider this query:

SET STATISTICS IO ON
SELECT * FROM SalesHistory

The following is returned in the Messages window in SQL Server Management Studio:

Table 'SalesHistory'. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

There are several bits of data returned by STATISTICS IO, but I am really only concerned with the logical reads portion because it will tell me the number of pages read from the data cache. This is the most helpful to me because it will stay constant when I run the same query, which is important because there are sometimes external factors that might vary the execution time of my queries, such as locking by other queries.

When I'm tuning my queries, my goal is to get the number of logical reads as low as possible. Fewer logical reads typically leads to faster execution times.

Fine tuning

In a future article, I will look at how you can use the graphical execution plans created by SQL Server to fine tune your queries.

Tim Chapman is a SQL Server database administrator and consultant who works for a bank in Louisville, KY. Tim has more than eight years of IT experience, and he is a Microsoft certified Database Developer and Administrator. If you would like to contact Tim, please e-mail him at chapman.tim@gmail.com.

-----------------------------------------------------------------------------------------

Get SQL tips in your inbox

TechRepublic's free SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system. Automatically subscribe today!

About

Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.

16 comments
sammesel
sammesel

I've been using SQL Server since 1990, I'm surprised to see a 'tip' stating to use JOIN instead of subquery. this is a BIG - HUGE mistake. Whenever the SQL Server engine processes a JOIN it has to do a cartesian product among the tables, thus it uses a lot of resouces (memory). When the engine executes a query with a subquery the inner query will result in a small resultset, then the outer query will need to compare a column to those entries returned by the subquery, which will use much less server resources. A Situation that you'll have dificulties using SubQuery is where you need to use multiple fields to relate both tables (you still be able to use Subquery if all fields are 'string' type through concatenation)

thisisfutile
thisisfutile

Thanks again Tim. I've been following this line of posts because the topic has been speed. Thanks for pointing out the STATISTICS IO option. This is one of those "nuggets" that I always look forward to discovering and now that I know about it, I plan on using it often. Also, I never would have guessed that using a Transaction would actually speed up the execution. In fact, I would have put my money on the converse. I'm still learning (and feel I have a long way to go)! :-) One question. In your "note" about transactions: "If there are too many statements inside a transaction, it will increase the duration of the transaction, which increases the amount of time locks are held on the objects inside of the transaction". Do you have an article that discusses this further? While I feel I understand the basics of transactions, I'm looking for more input on them because I've recently created a procedure to update customer accounts during our off-hours and it currently takes 7 minutes to complete. I have no idea if this is good or bad from a speed standpoint, all I know is that it works. I know it runs in one transaction, so I assume for 7 minutes, all of our cust accounts are locked. I?m guessing that breaking it down into small groups of customer blocks (using a loop) would be safer and speed things up. Thanks for any references or input.

mekashifraza
mekashifraza

i have written all the required codes like connection ,data adapoter,dataset.but it not working

chapman.tim
chapman.tim

Wow...this is probably the most incorrect comment I've ever had posted on an article of mine. If you have worked with SQL Server for that long, you should know how joins work....which doesn't seem to be the case here. A JOIN does in NO WAY do a cartestian product. A CROSS JOIN does a cartesian product...which is basically joining all table rows from one table with all the rows from another. A JOIN finds those records in one that table match with records in another table. A subquery does limit your resultset, but at the cost of removing duplicates. I'll show the exact difference between the two execution plans in a future article.

JBNY
JBNY

I ran this query in the Query Analyzer for 6 hours and then gave up. 29 million recs at 24GB. Update A Set Acctid = B.Acctid From B Where Substring(VendId,5,4) = B.VendId4 I did a link from Access and Access took 4 Hrs but it worked. Any suggestions will help me understand the SQL system.

Geoff Bomford
Geoff Bomford

I have a foot in both camps! A specific subquery query in SQL7 returned a result in under a second,when the database was updated to SQL2005 the same query took over 10 seconds! I tried to optimize with new indices etc but no joy. Converting the query to a JOIN resulted in acceptable performance in SLQ2005. If you have any explanation for this I would be interested.

Tony Hopkinson
Tony Hopkinson

One column look up by primary key closed set of integers 1 - 1000 ? The colleague was testing large batch queries in a payroll engine, I figured there were a lot of ways that might go different, so I gave it about the simplest thing possible. 30% slower, go figure 10,000 queries took 16 seconds ish using 2005 as a backend. The interesting thing is, it was about 2 seconds faster with prepared queries but still 30% slower than 2000 Turned out in the apps I work on the hit was negligible, so must admit I didn't pursue. Unable to now as I don't have 2000 on my machine. I was scratching my head as well, tried all sorts of thingss to give 2005 a 'fair shake', it gave me two fingers. Oh and MS were not surprised when we pointed it out. Course that could have been another **** you from our ex partner.

chapman.tim
chapman.tim

Thats odd. I've never noticed a performance hit since I've been using 2005...and it seems strange that MS would have suggested using sprocs...simply because they manner in which statements are cached doesn't really require the use of them. A statement a sproc runs will be stored in cache the same way a statement from a batch is ran. What was the output from STATISTICS IO when you ran the tests?

Tony Hopkinson
Tony Hopkinson

then I performed my own test Wrote a wee app One table, one column, int and an identity. Stuffed a thousand records in it Generated a random number selected it from the table 10,000 times Did that for prepared and unprepared on 2000 and 2005 on the same machine. Also with both DBMS executing and one or the other. Did it just about every favour I could, as far as I can make out the performance hit was a deliberate choice on MS's part. MS response, use stored procedures. Aside from having to rewrite 8 years of legacy code, there's a move to go DB agnostic and provide MySQL as an option as well, so that would cause more problems than it solved. Cheers Bill. Thanks for f'all. :(

Geoff Bomford
Geoff Bomford

and checked the compatibility mode. This was version 1 of SQL 2005, so maybe there were other issues too, I haven't tried to reproduce the problem with later versions. Do you have a reference for 2005 being 30% slower than 2000? I must say that my subjective impression would agree with this figure, and I've had to create lots more indices to try and get decent performance! I'll check if AUTO_UPDATE_STATISTICS is enabled.

Geoff Bomford
Geoff Bomford

I know I tried everything I could think of at the time and re-writing the query to eliminate the sub-query was the only option that worked.

Tony Hopkinson
Tony Hopkinson

the upgrade. There was a known problem with 2005's optimisation engine being thoroughly mislead by 2000's stats. Using the migrate turns off AUTO_UPDATE_STATISTICS you need to turn it back on to generate decent execution plans You might also want to check the database compatibility mode One last note 2005 is about 30% slower running ad-hoc queries than 2000.

chapman.tim
chapman.tim

Thats interesting that it was that large of a difference. I wouldn't be able to tell you for sure why without seeing the data structures and amount of data...but I'm guessing it might be some difference in the execution plan between the 2 versions. (Just a guess) Does the table that you were using for the subquery contain a lot of data?

Editor's Picks