Data Management

See SQL Server graphical execution plans in action

Tim Chapman identifies a few basic things to look for in a graphical execution plan to help you better understand how SQL Server uses indexes. He also offers ideas about how to make your queries faster.

Execution plans are one of the best tools to use to tune your SQL Server queries. In this article, I identify a few basic things to look for in a graphical execution plan to help you better understand how SQL Server uses indexes. I also offer ideas about how to make your queries faster.

Setting up my example

I need to create the table that I will be using throughout the examples and load some data into the table. I will load a fairly substantial amount of data into the table so that SQL Server will not just scan the table to search for data. Typically, if a table is relatively small, SQL Server will scan the table rather than deciding the best combination of indexes to use because it will take less time to scan the table.

CREATE TABLE [dbo].[SalesHistory]

(         

      [SaleID] [int] IDENTITY(1,1),         

      [Product] [varchar](10) NULL,               

      [SaleDate] [datetime] NULL,                

      [SalePrice] [money] NULL

)

GO

SET NOCOUNT ON

DECLARE @i INT

SET @i = 1          

WHILE (@i <=50000)

BEGIN                         

       INSERT INTO [SalesHistory](Product, SaleDate, SalePrice)                     

       VALUES ('Computer', DATEADD(ww, @i, '3/11/1919'),

       DATEPART(ms, GETDATE()) + (@i + 57))                

       INSERT INTO [SalesHistory](Product, SaleDate, SalePrice)         

       VALUES('BigScreen', DATEADD(ww, @i, '3/11/1927'),

       DATEPART(ms, GETDATE()) + (@i + 13))                     

       INSERT INTO [SalesHistory](Product, SaleDate, SalePrice)             

       VALUES('PoolTable', DATEADD(ww, @i, '3/11/1908'),

       DATEPART(ms, GETDATE()) + (@i + 29))                            

     SET @i = @i + 1

END

The following statement creates a clustered index on the SaleID column on the SalesHistory table. This is somewhat of an arbitrary choice for a clustered index on the table, but it makes sense for this example since the SaleID will be in increasing order for the rows entered. Creating this index will physically order the data in the table based upon the SaleID in ascending order.

CREATE CLUSTERED INDEX idx_SalesHistory_SaleID
ON SalesHistory(SaleID ASC)

Run this statement to turn on the IO statistics for our queries.

SET STATISTICS IO ON

In order to the view the execution plan for the queries I will run, I need to turn the option on. To do this, I right-click in the Query Editor window and select Include Actual Execution Plan. See Figure A. Figure A

Figure A

The following statement selects a row from the SalesHistory table based on a SaleID, on which there is a clustered index.

SELECT * FROM SalesHistory
WHERE SaleID = 9900
You can see the graphical execution plan and statistics for this query in Figure B. Figure B

Figure B

The query optimizer performed a Clustered Index Seek on the query, which is technically the fastest type of search that you will see on a table. Notice that there were three logical reads involved in finding this row in the table:

Table 'SalesHistory'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The script below searches for all records in the SalesHistory table where the product is PoolTable. There is currently no index on this column, and there are only three different values in this field for all of the rows in the table. The possible values found could be PoolTable, Computer, or BigScreen.

SELECT * FROM SalesHistory
WHERE Product = 'PoolTable'
The execution plan for the above query indicates that a Clustered Index Scan was performed. A Clustered Index Scan was performed because there is a clustered index on the table. Remember that a clustered index sorts the table in the order defined by the index keys. In this case, the clustered index really doesn't help me because a scan will still need to occur to find all of the instances where the product is equal to PoolTable. See Figure C. Figure C

Figure C

Table 'SalesHistory'. Scan count 1, logical reads 815, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So, what will happen if we create an index on the Product column in SalesHistory table? If you run the following statement to create a nonclusterd index on this table, you will notice that nothing has changed, even with an index on the Product column. The execution plan and the number of logical reads are the same because of the selectivity of the values in the column. There are only three different values for all of the rows in this table, so an additional index isn't going to help our case because there are so many records to pull back. Indexes are most beneficial when they are declared on columns that have many distinct values.

CREATE NONCLUSTERED INDEX idx_SalesHistory_Product
ON SalesHistory(Product ASC)

I am dropping the index so it doesn't skew any results from the rest of my examples.

DROP INDEX SalesHistory.idx_SalesHistory_Product

The SaleDate in the SalesHistory table is reasonably unique based upon the script I used to generate the data. If I create an index on the SaleDate and search for a certain data, I expect that my search will use the index a lot more effectively than my indexes on the Product column.

CREATE NONCLUSTERED INDEX idx_SalesHistory_SaleDate

ON SalesHistory(SaleDate ASC)

SELECT SaleDate, Product FROM SalesHistory

WHERE SaleDate = '1971-05-11 00:00:00.000'

The execution plan below confirms my suspicion. Because of the higher selectivity of the SaleDate column, an index seek was used. This execution plan also displays what is known as a Bookmark Lookup. A Bookmark Lookup occurs when SQL Server has found the record in a nonclustered index that satisfies the search request, but columns are requested in the SELECT field list that are not included in the index, so SQL Server has to go out to disk and find the additional data. In this execution plan, the Index Seek finds the row, and the Clustered Index Seek is the Bookmark Lookup. In SQL Server 2000, this description is Bookmark Lookup, and in SQL Server 2005 SP2, it reads Key Lookup.

It's generally a good idea to avoid Bookmark Lookups if you can because of the overhead of the extra reads involved in finding the data you need. The typical fix for them is to include the extra columns that you are reading in the index used to find the data. This is not going to work for every case, so it will take some effort to find a balance that suits your needs. See Figure D. Figure D

Figure D

Table 'SalesHistory'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now I want to see the execution plan for the same query but without a clustered index on the SaleID column, so I will need to drop the clustered index.

DROP INDEX SalesHistory.idx_SalesHistory_SaleID

SELECT SaleDate, Product FROM SalesHistory

WHERE SaleDate = '1971-05-11 00:00:00.000'
You can see from the execution plan that the Clustered Index Seek turns into an RID Lookup. It is still a Bookmark Lookup, but it is labeled differently because the table doesn't have a clustered index. One thing that is very interesting about this query is the number of logical reads produced. This query only produces four logical reads, while the previous query with the clustered index produced five logical reads. The extra read is due to the fact that the clustered index had to be used to find the row of data. Anytime a nonclustered index is accessed on a table that has a clustered index, the clustered index is ultimately used to find the rows for which you're searching. Because the SalesHistory table currently doesn't have a clustered index, this query was one logical read faster. See Figure E. Figure E

Figure E

Table 'SalesHistory'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

This does not mean that having a clustered index may be slowing your queries down. Every table should have a clustered index because it sorts the data in order and aids the other indexes with their searches. This single query happened to be a little bit quicker because there was no clustered index on the table. The increased speed is negligible in this case, and almost any other search on the table will be aided by a clustered index.

Because my table doesn't currently have a clustered index, I want to recreate the one that I had on the table. See the following code:

CREATE CLUSTERED INDEX idx_SalesHistory_SaleID
ON SalesHistory(SaleID ASC)

Table 'SalesHistory'. Scan count 1, logical reads 811, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SalesHistory'. Scan count 1, logical reads 814, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The SalesHistory table was accessed twice by the clustered index recreation because, when a clustered index is added to a table, any nonclustered indexes on that table will need to be rebuilt. A nonclustred index must first access the clustered index before it can return data. The nonclustered index that I had on the SaleDate column was rebuilt after the clustered index was built. See Figure F. Figure F

Figure F

Tuning your queries

The examples in this article are not a definitive guide for building indexes or reading execution plans and query tuning. My goal is to point you in the right direction to learn about tuning your own queries. Query tuning isn't an exact science, and the needs of your database will change over time. The more you experiment with the possible combinations of indexes, the more successful you will be.

Next time, I plan to take another look at graphical execution plans and how to read them when table joins occur.

Tim Chapman a SQL Server database administrator and consultant who works for a bank in Louisville, KY. Tim has more than eight years of IT experience, and he is a Microsoft certified Database Developer and Administrator. If you would like to contact Tim, please e-mail him at chapman.tim@gmail.com.

----------------------------------------------------------------------------------------- Get SQL tips in your inbox TechRepublic's free SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system. Automatically subscribe today!

About

Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.

7 comments
jayrajvakil
jayrajvakil

This was really a good articale Tim. Thanks for providing a good starting point on Query optimization. This may be just another article for the seasoned DBAs on TR but a great insite for a novice like me. I will be waiting for your take on understanding performance of Joins.

chapman.tim
chapman.tim

Have a look at this articles on TR (not written by me) that cover indexes. Let me know if these answer your questions. If not, I'd be happy to include an article that details the differences between the two types of indexes. Indexing SQL Server relational databases for performance http://articles.techrepublic.com.com/5100-6313_11-5148062.html Database Optimization: Increase query performance with indexes and statistics http://articles.techrepublic.com.com/5100-6313_11-5146588.html

thisisfutile
thisisfutile

I'm still following (and enjoying) this series Tim. Thanks again for more good info. There are probably many SQL Server DBAs who know detail like you do and may gather little or nothing from this series, but this is the perfect mix of general and detail info to keep my interest and I find it extremely valuable. I'm looking forward to the next article. Question: If you haven't done it in another article already, could you expand on Clustered and non-clustered indexes? If you have, could you point me to your article(s)? I'm curious because while I work closely with a fairly large database, I didn't have anything to do with setting it up. When it comes to indexes, I still have a lot to learn and this article represents my first peek into the affects of indexes on a table (the date column was a good example...I tried the Select statement with and without the index present and the difference was very noticeable). I was especially curious when you said that adding a clustered index to a table means that all non-clustered indexes need to be rebuilt. If you plan to discuss this later, as usual, I'll continue to follow this series and wait patiently. Thanks, Gabe

thisisfutile
thisisfutile

That would be the sound of my head as my mind is expanding. :-P Thanks for the information about indexes Tim. Finally, index articles and discussions are making more sense. It feels good to know that I no longer have to fear them, however just my introduction has me asking dozens of new questions. Thanks for opening the door. Again, I look forward to your future articles.

Shellbot
Shellbot

I've only recently started following your stuff..it helps me out a bit here and there. I second the suggestion of doing some stuff on Clustered and non-clustered indexes. In theory..i can recite what the books say..but as I've inherited the DB, in practice (and very little practice at indexing)..they make my head hurt!

chapman.tim
chapman.tim

I'm happy to hear that you like my articles. I don't think I have any articles that outline the difference between the two types indexes. Basically, a clustered index orders the data in the table based upon how you define that index. Searching with a clustered index is great for range searches based upon the index key because the data will be right next to each other. You can only have one clustered index per table. A NC index, on the other hand, uses the same type of index structure as a clustered index, but does not sort the data in order. Instead, it uses a row pointer to the location of the data page that it needs to return data, and uses the clustered index key as a tool to locate that data. You can have up to 255 (I think) nonclustered indexes on a table. Does that make sense?

thisisfutile
thisisfutile

Thanks for the explanation. Now Figure E makes even more sense to me because I didn?t understand that relationship between clustered and non-clustered indexes. Also, I referenced your comment "when a clustered index is added to a table, any nonclustered indexes on that table will need to be rebuilt" in my previous post and the NC index being a pointer explains why the NC index(es) on the table need rebuilt after a clustered index is defined. I love it when the fog of confusion begins to lift! :-)

Editor's Picks