Data Management

Using Hash Fields in SQL Server

From auditing data changes to capturing data for loading a data warehouse, hash fields have a ton of practical uses. In today’s article, database consultant Tim Chapman shows how you can use built-in SQL Server functions to create your own hash fields.

 A hash field is a calculated numeric field based on the value(s) contained in one or more fields.  This calculated value is great for detecting data changes, which can be put to use in auditing strategies or datawarehouse ETL (extraction, transformation, loading) operations.

Creating a hash field in SQL Server is easy.  The database engine has a couple of built-in functions that can be used to generate the hashed value.  Two of these functions are CHECKSUM() and BINARY_CHECKSUM().  CHECKSUM() is intended to be used to build hash indexes while BINARY_CHECKSUM() is great for detected data changes.  I'll be using the BINARY_CHECKSUM funtion for the purposes of todays writing.

Using the hash field

In the following example I'll show you how you can take advantage of the BINARY_CHECKSUM() function.  First, I'll create the SalesHistory table and load some data into it.

CREATE TABLE [dbo].[SalesHistory]

(

                SaleID int IDENTITY(1,1),

                Product varchar(10) NULL,

                SaleDate datetime NULL,

                StatusID TINYINT NULL,

                SalePrice money NULL

)

GO

 

SET NOCOUNT ON

 

BEGIN TRANSACTION

DECLARE @i INT

SET @i = 1

WHILE (@i <=500)

BEGIN

 

                INSERT INTO [SalesHistory](Product, SaleDate, SalePrice, StatusID)

                VALUES ('Computer', DATEADD(ww, @i, '3/11/1919'),

                DATEPART(ms, GETDATE()) + (@i + 57), 1)

 

                INSERT INTO [SalesHistory](Product, SaleDate, SalePrice, StatusID)

                VALUES('BigScreen', DATEADD(ww, @i, '3/11/1927'),

                DATEPART(ms, GETDATE()) + (@i + 13),5)

 

                INSERT INTO [SalesHistory](Product, SaleDate, SalePrice, StatusID)

                VALUES('PoolTable', DATEADD(ww, @i, '3/11/1908'),

                DATEPART(ms, GETDATE()) + (@i + 29),8)

 

                SET @i = @i + 1

 

END

COMMIT TRANSACTION

 

ALTER TABLE SalesHistory

ADD CONSTRAINT pk_SalesHistory_SaleID PRIMARY KEY CLUSTERED (SaleID)

Once my table is created, I can add a calculated field to the SalesHistory table using the BINARY_CHECKSUM() function.

ALTER TABLE SalesHistory

ADD HashField AS BINARY_CHECKSUM(SaleID, StatusID, Product)

 

The HashField I create above creates a checksum on the SaleID, StatusID, and Product fields.  The resulting value is a signed integer value based on the values in the SaleID, StatusID, and Product fields.  As the values in these fields the HashField value will change also.  I can then use this HashField value to keep track of changes to specific data fields, which makes it great for capturing changes for loading a data warehouse.

Capturing these changes can be as easy as comparing each hash value to a previously stored hash value to determine changed rows to as complicated to finding row data changes and then routines to find the changed values.

Regardless of how you use it, using hash values in SQL Server is a great tool.

 

About

Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.

3 comments
edbeo
edbeo

Hi, I hope you can help me with this question, How can I convert a hash value to sqltext? I have a SQL Server 2000 SP4 Standard ed. Thank you!

victor.gutzler
victor.gutzler

You wrote "Capturing these changes can be as easy as comparing each hash value to a previously stored hash value to determine changed rows to as complicated to finding row data changes and then routines to find the changed values." Are the stored hashvalues typically saved in another field in the same table, or is it better to save them in a separate table keyed to the record id? I assume the routine is a query of nonduplicated hash values in the calculated field and the stored field....

Tony Hopkinson
Tony Hopkinson

In the data warehousing strategy You'd have the last update to it store the hash, and then compare it with the operational DB, to see if there's any change, and may be which field changed. It's going to be a bit quicker than testing one against the other for each field or set of fields that would 'trigger' a write to teh warehouse. If it was one to one you could test for a difference in hash instead of comparing each field involved.... You'd have a bit of fun if you dropped the hash and extended it though, without doing a wee twiddle that would hit every record in both systems. You just know it's going to happen.....

Editor's Picks