Banking

Setting up Change Data Capture in SQL Server 2008

Change Data Capture (CDC) is a new native feature in SQL Server 2008 Enterprise Edition. Tim Chapman shows how to use CDC to easily set up data auditing on your SQL Server system.

Auditing database data is no easy task, but it's absolutely required for most industries. In a previous column, I showed how you can use some native SQL Server functionality to do a decent job of capturing data changes.

The inherent problem with capturing these data changes is that it requires triggers on the base table. These triggers can either immediately enter the data into auditing tables or use Service Broker functionality to capture the data at a later time. In either event, the initial capture has to be done in the scope of the original transaction. In SQL Server 2008, the Change Data Capture (CDC) feature allows you to capture data changes much more easily and without the use and overhead of triggers.

Setting up CDC

First, I want to create a database for the use of testing my scenarios. The statement below creates a database named CaptureChanges on your database instance:

CREATE DATABASE CaptureChanges
GO

To use CDC, I'll need to enable it at the database level. The statement below calls the sp_cdc_enable_db system stored procedure, which enables CDC for the current database scope:

USE CaptureChanges

GO

EXEC sys.sp_cdc_enable_db

GO

To illustrate how CDC captures data changes, I'll need a table to perform data changes. The script below creates a table named SalesHistory and inserts data into it:

IF OBJECT_ID('SalesHistory') IS NOT NULL

DROP TABLE SalesHistory

GO

CREATE TABLE [dbo].[SalesHistory]

(               

            SaleID int IDENTITY(1,1) PRIMARY KEY,               

            Product varchar(10) NULL,               

            SaleDate datetime NULL,               

            StatusID TINYINT NULL,               

            SalePrice money NULL

)

            GO

SET NOCOUNT ON

BEGIN TRANSACTION

            DECLARE @i INT

            SET @i = 1

            WHILE (@i <=500)

            BEGIN               

                        INSERT INTO [SalesHistory](Product, SaleDate, SalePrice, StatusID)               

                        VALUES ('Computer', DATEADD(ww, @i, '3/11/1919'), DATEPART(ms, GETDATE()) + (@i + 57), 1)               

                       

                        INSERT INTO [SalesHistory](Product, SaleDate, SalePrice, StatusID)               

                        VALUES('BigScreen', DATEADD(ww, @i, '3/11/1927'), DATEPART(ms, GETDATE()) + (@i + 13),5)               

                       

                        INSERT INTO [SalesHistory](Product, SaleDate, SalePrice, StatusID)               

                        VALUES('PoolTable', DATEADD(ww, @i, '3/11/1908'), DATEPART(ms, GETDATE()) + (@i + 29),8)               

                       

                        SET @i = @i + 1

                       

            END

COMMIT TRANSACTION
GO

At this point, nothing in the SalesHistory table has been captured, as it is yet to be enabled for CDC. To use CDC for a particular table, the table must contain a Primary Key constraint or a unique constraint to uniquely identify the rows in the table. In the example above, I use an IDENTITY column for my Primary Key constraint.

To enable the table for data capturing, I'll need to run a system stored procedure and pass in the table name. (The SQL Server Agent should be running when you run this stored procedure.)

EXEC sys.sp_cdc_enable_table

@source_schema = N'dbo',

@source_name   = N'SalesHistory',

@role_name     = NULL,

@supports_net_changes = 1

GO

Now I'm going to update an arbitrary set of records in the SalesHistory table. Because I now have CDC enabled for this table, the data changes should be captured by the CDC system.

UPDATE s

SET SalePrice = SalePrice + 1

FROM SalesHistory s

where SaleID % 13 = 0

If the CDC schema had not already existed in the database, the above procedure call would create it. All CDC tables will belong to this schema.

When the CapturesChanges table is enabled for CDC, two SQL Server Agent jobs are created. One job uses the LogReader to capture changes while the other job is used to clean up older messages. The job names created are listed below:

cdc.CaptureChanges_capture
cdc.CaptureChanges_cleanup

CDC has system tables that are created when CDC is enabled. The main one to concern yourself with at first is the actual data that stores the audited data. A SalesHistory_CT table is created when I enable the SalesHistory table for CDC. Data changes are captured in this table, along with a few housekeeping columns that CDC uses for reporting changes to data. Notice that this table belongs to the cdc schema.

TechRepublic's Servers and Storage newsletter, delivered on Monday and Wednesday, offers tips that will help you manage and optimize your data center. Automatically sign up today!

About Tim Chapman

Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.

Editor's Picks

Free Newsletters, In your Inbox