Data Management

Find blocking processes using recursion in SQL Server 2005

SQL Server 2005 has several system tables and views that can provide a ton of insight into blocking problems that you may be facing in your database environment. Through these procedures, you can find blocking chains. Learn more about blocking processes.

Blocking occurs in SQL Server 2005 when one process has one or more records locked, while another process tries to acquire locks on the same set (or subset) of records. This causes the second process to wait until the first process is done so it can acquire the records; this can cause other processes to be blocked if they are waiting on the resources that the second connection has acquired. These processes are ultimately dependent upon the first process to complete its work before they can complete theirs. This can create a daisy-chain of processes waiting to complete their work.

This blocking situation is a bit different than a deadlock scenario. A deadlock occurs when process A has ownership of resources that process B needs. Process B, in turn, owns resources that process A needs to complete its work. In theory, this would last forever. Fortunately, SQL Server 2005 has algorithms to find these scenarios and stop one process so the other can finish its work. On the other hand, blocking can last a lot longer than a deadlock because it isn't stopped automatically by SQL Server 2005. The blocked processes must wait until the blocking connection finishes its work before they can finish theirs.

Finding blocking connections

SQL Server 2005 has several system tables and views that can provide a ton of insight into blocking problems that you may be facing in your database environment.

Through these procedures, you can find blocking chains. I define a blocking chain as one particular process is blocking one or more processes, and those processes are blocking other processes. These blocking chains happen all the time, especially when there are instances of long-running transactions. Many times, I have seen a DBA find the process that is causing the most locking and stop the process. This is usually not a great idea, and it is definitely not the way to find the root of the problem. The processes that cause the most locking are usually doing so because they are being blocked by another process.

The following statement pulls information from the sys.sysprocesses system table for user processes. The information is placed into a temporary table named #Processes that I will use for my recursive Common Table Expression (CTE) query. The sys.sysprocesses table is a great tool because it contains a lot of data in one table. (The sys.sysprocesses table may not be included in future versions of SQL Server.) I can also use the CROSS APPLY construct to get the actual SQL statement that is being used based upon the sql_handle value in the table.

SELECT

            s.spid, BlockingSPID = s.blocked, DatabaseName = DB_NAME(s.dbid),

            s.program_name, s.loginame, ObjectName = OBJECT_NAME(objectid,                      s.dbid), Definition = CAST(text AS VARCHAR(MAX))

INTO        #Processes

FROM      sys.sysprocesses s

CROSS APPLY sys.dm_exec_sql_text (sql_handle)

WHERE

           s.spid > 50

The following statement declares and uses the recursive CTE for finding blocking chains in the temp table I just created for the sys.sysprocesses table.

The WITH statement declares the CTE with the code inside of the parentheses being the body of the expression. In the first SELECT statement, I am finding all processes that are not blocked themselves but are blocking at least one other process. To do this, I just join the #Processes temp table to itself based on the SPID field and the BlockingSPID field. This is a crucial step in the query, as I am locating the starting point for finding the chain of processes that are being blocked.

This first SELECT can essentially be thought of as a "base case," which will act as a root for the rest of the recursion. This SELECT has a couple of functions that will allow me to tell the hierarchy of the recursion. The RowNo field uses the ROW_NUMBER function which will indicate the separate number of processes that are blocking other processes. The LevelRow, which I initially declare as 0, will indicate the recursion level for that particular blocking process. (This information will make more sense later in the article when I give an example of the information that is provided by the CTE.)

The next SELECT statement after UNION ALL (there always has to be a UNION ALL in a recursive CTE) is the recursive section; it joins what is returned by the CTE and the #Processes temp table based upon the BlockingSPID from the temp table and the SPID field from the CTE. For this statement, I am only concerned with the blocked statements.

The last SELECT statement runs the recursive CTE and returns the values.

WITH Blocking(SPID, BlockingSPID, BlockingStatement, RowNo, LevelRow)

AS

(

     SELECT

      s.SPID, s.BlockingSPID, s.Definition,

      ROW_NUMBER() OVER(ORDER BY s.SPID),

      0 AS LevelRow

    FROM

      #Processes s

      JOIN #Processes s1 ON s.SPID = s1.BlockingSPID

    WHERE

      s.BlockingSPID = 0

    UNION ALL

    SELECT

      r.SPID,  r.BlockingSPID, r.Definition,

      d.RowNo,

      d.LevelRow + 1

    FROM

      #Processes r

     JOIN Blocking d ON r.BlockingSPID = d.SPID

    WHERE

      r.BlockingSPID > 0

)

SELECT * FROM Blocking

ORDER BY RowNo, LevelRow

Analyzing the sample code

Figure A shows an example of a potential output by the recursive CTE declared above. In the output, there are two processes blocking other processes; this is identified in a couple of ways. First, there are two records that have a 0 in the BlockingSPID field. These records will have the value 0 in the LevelRow field; they are also the "base case" for the recursion. The RowNo field indicates the separate set of blocking groups. The first set of statements that are blocked have the value 1 for the RowNo field, the second set of statements that are blocked have the value 2 for the RowNo field, etc. Figure A

Figure A

In analyzing the first group of blocked statements, I see that procedure usp_StoredProcedure1 is the initial point of blocking -- that procedure is blocking usp_StoredProcedure2. The LevelRow field can be used here, or I can try to match the SPID in one row to the BlockingSPID value in the next row. I find that the LevelRow field is the easier of the two choices. I also see the following:

  • usp_StoredProcedure2 is blocking usp_StoredProcedure3.
  • usp_StoredProcedure3 is blocking usp_StoredProcedure4.
  • usp_StoredProcedure 4 is blocking two stored procedures: usp_StoredProcedure5 and usp_StoredProcedure14. The LevelRow value is the same for these two rows.
  • The blocking analysis concludes for RowNo 1 with usp_StoredProcedure5 blocking the final statement usp_StoredProcedure6.

Using this tool

There are two very useful ways to use this tool for finding blocking on your SQL Server 2005 system. One way is to run it every couple of minutes and store the results in a table for later analysis. This is very handy if you encounter blocking problems during the night because it allows you to find the problem the next day. The other way is to wrap the functionality up into a system stored procedure that you can easily call anytime if someone is complaining about sessions timing out or the database engine running slow.

Tim Chapman a SQL Server database administrator and consultant who works for a bank in Louisville, KY. Tim has more than eight years of IT experience, and he is a Microsoft certified Database Developer and Administrator. If you would like to contact Tim, please e-mail him at chapman.tim@gmail.com.

-----------------------------------------------------------------------------------------

Get database tips in your inbox

TechRepublic's free Database Management newsletter, delivered each Tuesday, contains hands-on SQL Server and Oracle tips and resources. Automatically subscribe today!

About

Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.

6 comments
jbienn
jbienn

In studying the results of the query, I also have another question. This concerns the data in the "BlockingStatement" column. From what I can tell, it is only the blockING statement when LevelRow is 0. Otherwise, it is just the blockED statement. The data/SQL in the BlockingStatement column belongs to the SPID (which is the BlockED SPID), not the BlockingSPID. This is truly a blocking statement at level 0 because it is the statement that is blocking everything else. But it seems like you wouldn't label it the blocking statement for all other LevelRow values. Make sense or am I missing something? Thanks, - JJ

jbienn
jbienn

I think I found a mini-bug in the main query. I've continued to work on this code since my first two postings. I created #Processes from what was in my server and then manually added numbers to the BlockingSPID column to create some chains of blocking. Then I ran the main recursive query and studied the output. I'm not sure if the following is a good example/realistic data, but I just happened to have my two root SPIDs ("base camp" results) blocking two other processes. In other words, the data looked a bit like this: (not sure if this will format OK since my code in my previous postings did not work out): SPID BlockingSPID 51 0 52 51 56 51 67 0 66 67 69 67 My chains were longer than this, but this is what you need to see to get my point. If you run the base camp query alone, you would expect to get 2 rows (the row with SPID 51 and 67). Instead, the query returns 4 rows with 51 and 67 twice. This causes the final query to have double of everything. I resolved this by doing a group by in base camp query. I would post my code here, but the formatting gets messed up. Any thoughts? Thanks, - JJ

jbienn
jbienn

I didn't want to enter too big of a post, so I broke out my questions. This posting is asking about the return data and testing and understing the return data. I can't read the picture in your article, so I can't review your sample data. And when I run the function multiple times on my own server, I am not getting any return rows. (Hopefully that means we do not have any blocking as opposed to meaning I messed up the SQL somewhere.) So, I can't figure out how to make sure I understand what the return results really mean. Any suggestions? One idea: in the example for your article: could you post the original rows for #Processes? That way I could re-create the final answer myself. Thanks, - JJ

jbienn
jbienn

Thank you for this posting. I was excited about the idea because the current information I get when my server has blocks of 3 seconds or longer is for the blockED statement/SQL, not the blockING statement. Getting the blocking information is more helpful in my opinion. I tried to take your code (giving you full credit) and put it into a table-valued user defined function that we could incorporate into our existing blocked traffic code. Of course, I didn't want to do that without fully understanding your code. One way for me to "own" code is to tinker with it. I tinkered with the first query in two ways: made it fit my own naming/formatting standards and tried to use systems DMVs instead of the system table. This is my first attempt at trying to use DMVs myself. On top of that, while I'm not new to SQL Server, I'm not clear on the difference between concepts like sessions, connections, and requests. (I'm not a DBA.) In my quick tests, the query below returns the same data as your query (except for the DatabaseNm column). My first question is: just off the top of your head, do you see anything of concern, any red flags, for this query? It looks OK to me, but am I missing something? SELECT sc.session_id AS UserSPID, sr.blocking_session_id AS BlockingSPID, DB_NAME(st.dbid) AS DatabaseNm, ss.program_name AS AccessingProgram, ss.login_name AS LoginNm, OBJECT_NAME(st.objectid, st.dbid) AS ObjectNm, CAST(st.text AS VARCHAR(MAX)) AS RequestSQL --INTO #Processes FROM sys.dm_exec_sessions ss JOIN sys.dm_exec_connections sc ON ss.session_ID = sc.session_ID LEFT JOIN sys.dm_exec_requests sr ON sc.connection_ID = sr.connection_ID CROSS APPLY sys.dm_exec_sql_text (sc.most_recent_sql_handle) AS st WHERE sc.session_id > 50 --gets us user processes My second question is about the DatabaseNm column. Out of say 225 returned rows, only about 2 are populated with the database name, compared to all the rows when I run your query. I wonder if there is a way to rectify the different/get database name information on more rows. At the same time, I wonder if it even matters in the big picture. Here are my thoughts as I wrote them into the comments of my UDF: NOTE: Aternatively, the DatabaseNm line could be: DB_NAME(sr.database_ID) AS DatabaseNm, The deal: According to BOL, dbid from sys.dm_exec_sql_text (ie st.dbid) IS NULL for ad hoc and prepared SQL statements. Since many of our SQL statements are ad hoc or prepared, we do not get the database information very often when the above statement is run. On the other hand, sr.database_ID should return database information *for every record in sys.dm_exec_requests*. The draw back is that there are very few records in sys.dm_exec_requests compared to the other DMVs used above and so once again you get little database name information. On the other hand, the final query used below only returns records when there are values in sys.dm_exec_requests, (that's where the blocking_ID info comes from), so it may not matter that sys.dm_exec_requests has fewer records. It may be the exact data that we need. Not sure of the correct answer. Any thoughts? Thanks!!!

chapman.tim
chapman.tim

Good catch. This bit of code should do the trick..although I am sure your way works too. IF OBJECT_ID('tempdb..#Processes') IS NOT NULL DROP TABLE #Processes

philip.kelley
philip.kelley

Cutting and pasting, the code works perfectly. Run it a second time (after you've tried to fix the block), and it fails because the temp table already exists. Rather than add a "if #table exists drop #table" statement, and to remove its (admittedly minimal) overhead, I changed that first query to a second (or rather, the first) CTE in the WITH clause, and referenced it in the second recursive query. Messier to read (and to explain), but tighter code. Philip