Almost all industries require some type of data auditing in one form or another. This is especially true in the financial and health care industries, where changes in data can have critical effects.  Sarbanes-Oxley compliance is a great example of how important data auditing is. Auditing data is important for processes other than just the retention of data. Extract, Transformation, and Loading (ETL) activities that incrementally load altered data into the data warehouse typically have to make use of some type of auditing to identify those records where the data has changed. Change Data Capture (CDC), a brand new feature in SQL Server 2008, features the ability to capture and store structure and data alterations in your SQL Server system.

What is Change Data Capture?

Change Data Capture (CDC), a wonderful new feature in SQL Server 2008, provides the ability to set up and manage database data auditing without requiring custom auditing procedures and triggers.  This feature captures DML operations (Insert, Update, Delete statements) and makes the altered database available for later reporting. CDC is first enabled at the database level, and then to the necessary tables.  After you enable a table for CDC, a similar table is created to track the data changes.

Insert and delete operations are represented as a single record in the change audit table, whereas update statements are represented as two records. Insert and delete statements involve only a single aspect of data: the row that has been inserted or the row that has been deleted. On the other hand, update statements involve two different sets of data: the record values that existed before the update statement and those that exist after the statement occurs.  The CDC system captures both of these records, which makes it very convenient to compare values correlated to the update statement.

In addition to capturing the previous value and new value from Update statements, CDC also captures the fields that were updated in the form of a bitmask. This mask can be used to easily determine which fields were included in the Update statement, and which fields were not.

As time goes on, and data keeps changing, you may find yourself in a position to clear out some of the data that has been captured by the CDC system. Luckily, SQL Server 2008 includes a retention policy for CDC that allows you to specify and remove data that has been captured and held for a certain amount of days.

CDC also does a pretty good job of keeping track and accommodating new fields and alterations to the underlying tables that are being audited. This is a great feature for reporting any type of structure changes.

The CDC process scans the transaction log to capture the changed data and write it to the audit tables. The auditing process is asynchronous and occurs in its own transaction scope, so there’s no overhead involved in the transaction that alters the data.

Your database does not need to be set to the Full recovery model in order to take advantage of CDC’s features. However, there are some things to consider if you’re using the Simple recovery model with CDC. The major point of interest is the fact that the log will not truncate until the data altered has been captured by the CDC process. This may eventually delay log truncation accomplished by the Checkpoint process, which could cause the log file to remain larger than expected. This is something that you should consider if you’re limited to space on your SQL Server.

In addition to capturing data that has been changed for CDC audited tables, CDC also has the ability to keep track of Data Definition Language (DDL) changes that are executed to the base tables. The ability to capture and maintain this data is paramount not only for auditing purposes, but also for change management purposes.

Next time

Today I looked at the capabilities and features of the new Change Data Capture technology in SQL Server 2008. These are without doubt great new features. However, the functionality present in CDC is available in SQL Server 2005, and can be accomplished with some custom coding and a little bit of replication work. Next time I’ll expand on the CDC subject and show you how you can set up data auditing using CDC on your SQL Server 2008 system. Perhaps in a future article I’ll look at how you can develop your own custom CDC functionality in SQL Server 2005.