Using advanced table hints in SQL Server

Tim Chapman explores the benefits and side effects of using three types of table hints when running queries in SQL Server: READCOMMITTED, REPEATABLEREAD, and SERIALIZABLE.

My previous SQL Server column covered two types of table hints: NOLOCK and READPAST. This article explores the advantages and drawbacks of using three other types of table hints when running queries in SQL Server: READCOMMITTED, REPEATABLEREAD, and SERIALIZABLE.

To look at these table hints in action, I'll set up a table and load the table with data. Listing A contains the code to create the table and load it.

READCOMMITTED

This is the default table locking behavior used by SQL Server. This type of isolation uses shared locks when reading data to ensure that only data that is committed to the database have been read. These locks are used because updates required exclusive locks on the data that they are modifying, which block any readers that try to use shared locks. For most general querying purposes, this isolation level does a good job of balancing database consistency and concurrency; however, it does have its faults, which include Non-Repeatable reads and phantom reads.

Get SQL tips in your inbox
TechRepublic's SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system.
Automatically sign up today!

Below is an example of how READCOMMITTED works. In this example, you will need to devote a query editor window or database connection to Session1 and another for Session2.

Session1:
BEGIN TRANSACTION
UPDATE SalesHistory
SET SalePrice = SalePrice + 1
WHERE SaleID = 201

Session2:
BEGIN TRANSACTION
SELECT * FROM SalesHistory WITH(READCOMMITTED)

This query will return approximately the first 200 records in the table. (It may not return exactly 200 records because the first session is likely using page locks, which is locking data pages of data, rather than individual rows.) Once the first 200 or so rows are returned, the query execution will remain running, but no additional rows will be returned. The query is waiting until the transaction that is locking from the first update statement has released its locks. I am able to see the data from the first 200 or so records returned. Note that I have created a transaction that is still open.

Session1:
UPDATE SalesHistory
SET SalePrice = SalePrice + 2
WHERE SaleID = 1
COMMIT TRANSACTION

This update statement is still waiting the scope of the initial transaction. I am updating the record with the SaleID of 1, which will be one of the rows returned in the transaction that began in Session2. As soon as I commit this transaction, the query from Session2 will complete.

Session2:
SELECT * FROM SalesHistory WITH(READCOMMITTED)
COMMIT TRANSACTION

I issue the same select query again. This time, the SalePrice returned for the row with SaleID = 1 is different, even though it is still within the same transaction. This is what is known as a Non-Repeatable read. The data that I am working on in the same transaction has changed outside of the transaction and may adversely affect the transaction later. This is one of two possible side effects with this type of transaction isolation level — the other type being "phantom" reads. (I'll go into more detail about phantom reads in the section about REPEATABLEREAD.)

REPEATABLEREAD

The REPEATABLEREAD table hint can deter Non-Repeatable reads. The following example shows this table hint in action. (This is actually the same example from the READCOMMITTED section, but it has much different results.)

Session1:
BEGIN TRAN
UPDATE SalesHistory
SET SalePrice = SalePrice + 1
WHERE SaleID = 201

Session2:
BEGIN TRAN
SELECT * FROM SalesHistory WITH(REPEATABLEREAD)

Session1:
UPDATE SalesHistory
SET SalePrice = SalePrice + 2
WHERE SaleID = 1

The query from Session 2, which uses the REPEATABLEREAD table hint, causes problems. When this table hint is issued, it guarantees that, for the records that were already returned by the transaction, these records cannot be changed by outside transactions. This ensures that these records can be read repeatedly, and the data will not have changed. The statement above causes a deadlock scenario, and the SQL Server database engine will pick one of the sessions as the deadlock victim.

While the REPEATABLEREAD isolation level prevents a Non-Repeatable read, it does not prevent phantom reads. It guards the data that has already been returned by the transaction, but it does not guard against new records in the previously returned result set. This phantom read scenario occurs when one transaction has a range or rows of an entire table locked in a transaction, and a separate transaction inserts a row into that locked range or rows in the table. Any subsequent reads by the first transaction result in the ability to see the new row, even though it was not present in the original read. Let's take a look at how a phantom read can occur.

Session1:
BEGIN TRANSACTION
SELECT COUNT(*) FROM SalesHistory WITH(REPEATABLEREAD)

The script above begins a transaction and reads all of the data in the SalesHistory table, using the REPEATABLEREAD table hint. This transaction will lock the entire table.

Session2:
BEGIN TRANSACTION
INSERT INTO SalesHistory(Product, SaleDate, SalePrice)
SELECT 'Computer', GETDATE(), 500
COMMIT TRANSACTION

This script creates a transaction, inserts a record into the locked table, and then commits the transaction:

Session1:
SELECT COUNT(*) FROM SalesHistory WITH(REPEATABLEREAD)
COMMIT TRANSACTION

If I run the same SELECT statement from the first session window, I will receive all 300 records that were received in the first query, but the additional record that were inserted in the Session2 window will also be included. A phantom read has occurred.

SERIALIZABLE

The previous examples move from higher concurrency to lower concurrency and from lower consistency levels to higher consistency levels. In my last article, I discussed how using NOLOCK doesn't lock any records being read, which ensures the highest level of concurrency (i.e., number of users that can read and modify the data) in the database. NOLOCK also allows for: the lowest consistency, dirty reads, Non-Repeatable reads, and phantom reads.

The SERIALIZABLE table hint is at the other end of the spectrum from the NOLOCK table hint. When you use the SERIALIZABLE table hint (also known as HOLDLOCK), it guarantees that no other transaction can modify or read uncommitted data in the current transaction. In other words, transactions must wait for other transactions to complete before completing their work. This drastically limits database concurrency and puts a premium on database consistency.

Take a look at the same example I used for the last example:

Session1:
BEGIN TRANSACTION
SELECT COUNT(*) FROM SalesHistory WITH(SERIALIZABLE)

This script returns all 300 records in the table while locking the table with the SERIALIZABLE table hint:

Session2:
BEGIN TRANSACTION
INSERT INTO SalesHistory(Product, SaleDate, SalePrice)
SELECT 'Computer', GETDATE(), 500
COMMIT TRANSACTION

This session tries to insert a record into the SalesHistory table, which is currently locked from Session1. Because of the blocking, this session will wait continuously until the transaction from Session1 has completed. This table hint prevents all other concurrency side effects that I previously discussed — dirty reads, Non-Repeatable reads, and phantom reads. The price you pay for this high data consistency is that other transactions must wait for the locking transaction to complete. This can drastically affect performance due to the reduced concurrency.

Recap

Higher concurrency levels increase the ability of many users to access data at the same time, but it also increases the number of possible side effects that may develop. At the same time, a higher consistency level can reduce the concurrency effects at the cost of more resource consumption due to maintaining locks, and it increases the chances that one transaction will block another.

Tim Chapman is a SQL Server database administrator who works for a bank in Louisville, KY, and has more than 7 years of IT experience. He is also Microsoft certified in SQL Server 2000 and SQL Server 2005. If you would like to contact Tim, please e-mail him at chapman.tim@gmail.com.