Data Management

Filtered Indexes in SQL Server 2008

Filtered indexes are a neat new feature in SQL Server 2008 that allows you to define indexes on subsets of data. In today’s article, database architect Tim Chapman shows how you can take advantage of this useful new feature.

 A filtered index is a non-clustered index created on a well-defined subset of data in a SQL Server table object.  By "well-defined", I am talking about those sets of data that are used exclusively to satisfy query criteria.  For example, if you have a field in a table that contains predominately NULL values, you may benefit from creating a filtered index that only contains those values that are NOT NULL.  Note that you cannot define a clustered index with a filter.

Why a filtered index?

Filtered indexes can provider performance gains in those scenarios where a majority of queries on a table filter on a specific subset of data.  These indexes are likely going to be much smaller than an index on the entire field, so there is less index storage involved.  Also, filtered indexes are likely going to take less work to maintain.  Because the filtered index will be smaller, data manipulation operations will affect smaller portions of the index, making these operations less costly in terms of database I/O.

Creating a filtered index

Let's take a look at how to create a filtered index, and how we can see some performance benefits from its use.  First, run the following script to create the SalesHistory table and populate it.

IF OBJECT_ID('SalesHistory', 'U') IS NOT NULL

DROP TABLE SalesHistory

GO

CREATE TABLE [dbo].[SalesHistory]

(

[SaleID] [int] IDENTITY(1,1),

[Product] [varchar](10) NULL,

[SaleDate] [datetime] NULL,

[SalePrice] [money] NULL,

CONSTRAINT PK_SalesHistory_SaleID PRIMARY KEY CLUSTERED (SaleID ASC)

)

GO

SET NOCOUNT ON

BEGIN TRANSACTION

DECLARE @i INT

SET @i = 1

WHILE (@i <=5000)

BEGIN

INSERT INTO [SalesHistory](Product, SaleDate, SalePrice)

VALUES ('Computer', DATEADD(ww, @i, '3/11/1919'),

DATEPART(ms, GETDATE()) + (@i + 57))

INSERT INTO [SalesHistory](Product, SaleDate, SalePrice)

VALUES('BigScreen', DATEADD(ww, @i, '3/11/1927'),

DATEPART(ms, GETDATE()) + (@i + 13))

INSERT INTO [SalesHistory](Product, SaleDate, SalePrice)

VALUES('PoolTable', DATEADD(ww, @i, '3/11/1908'),

DATEPART(ms, GETDATE()) + (@i + 29))

SET @i = @i + 1

END

COMMIT TRANSACTION

GO

With some records in my SalesHistory table, I am going to update the SaleDate to NULL for 6 out of every 7 records in the table.  This will give me a pretty sparse distribution of values in the SaleDate field.  After the update I will then create a normal nonclustered index on the SaleDate field.

UPDATE SalesHistory

SET SaleDate = NULL

WHERE (SaleID % 7) > 0

GO

CREATE INDEX idx_SalesHistory_SaleDate

ON SalesHistory(SaleDate)

Run the following command to run IO statistics on.  This allows you to view IO values for each TSQL command ran.

SET STATISTICS IO ON

The following query returns all rows from the SalesHistory table where the SaleDate contains a value.  This query uses the idx_SalesHistory_SaleDate index we created earlier, and uses an Index Seek operation to return 2142 rows.  This query requires 8 logical reads from the database to return the necessary rows.

Note that the queries used in this article only return the SaleDate field in the resultset.  The reason for this excluding or including more fields in the SELECT list will alter the execution plan.  So, for the purposes of this article, I will only return the field for which I am setting criteria.

SELECT SaleDate

FROM SalesHistory

WHERE SaleDate IS NOT NULL

I can view index related data for my idx_SalesHistory_SaleDate index through querying some system views.

SELECT i.name, p.rows, i.filter_definition

FROM

sys.partitions p

JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id

WHERE

OBJECT_NAME(i.object_id) = 'SalesHistory'

AND i.name = 'idx_SalesHistory_SaleDate'

The following query is similar to the query specified above, but filters those rows where the SaleDate does contain a NULL value.  This query also does an index seek on the index I created above, but this time requires 31 logical reads from the database as the query returns 12858 records.

SELECT SaleDate

FROM SalesHistory

WHERE SaleDate IS NULL

Now that I've looked a little bit how nonclustered indexes work, I'll take a look at how you can user fliteres when defining the nonclustered index to index only subsets of data.  First, I'll need to drop the index I created above.

DROP INDEX SalesHistory.idx_SalesHistory_SaleDate

In the following script I create a filtered nonclustered index on the SaleDate field.  This index will contain data pointers for ONLY those records for which the SaleDate IS NOT NULL.  This means that for any records where the SaleDate IS NULL, the index will not be considered at all.

CREATE INDEX idx_SalesHistory_SaleDate

ON SalesHistory(SaleDate)

WHERE SaleDate IS NOT NULL

In the following query, an index scan of the idx_SalesHistory_SalePrice is used with 7 logical database reads.  So, even though an index scan was performed, the operation took less logical reads due to the filtered index.

SELECT SaleDate

FROM SalesHistory

WHERE SaleDate IS NOT NULL

I can query the same system table query as before to view the number of records contained in the index.  The previous index contained all 15,000 records from the table, whereas the current index contains only records where the SaleDate IS NOT NULL (2142 records in this case).

SELECT i.name, p.rows, i.filter_definition

FROM

sys.partitions p

JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id

WHERE

OBJECT_NAME(i.object_id) = 'SalesHistory'

AND i.name = 'idx_SalesHistory_SaleDate'

In the following query, I look for those records where the SaleDate IS NULL.  Remember that the index I defined earlier only contains those records where the SaleDate IS NOT NULL, so it will not be considered for this query.  In fact, a clustered index scan is used to find the records where the SaleDate IS NULL, resulting in 79 logical database reads.

SELECT SaleDate

FROM SalesHistory

WHERE SaleDate IS NULL

Conclusion

The new filtered index feature in SQL Server 2008 is a very useful new feature.  It allows you to create indexes on only subsets of frequently used data.  However, these types of indexes should be used with care.  In almost all circumstances, a normal non-clustered index will be the more useful index to use rather than a filtered on.  Only after you are comfortable with the data usage patterns of your database should you consider creating filtered indexes, otherwise you may cause yourself

About

Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.

0 comments

Editor's Picks