SQL Server 2008's new MERGE statement allows you to insert, update, or delete data based on certain join conditions in the same statement. In previous versions of SQL Server, you have to create separate statements if you need to insert, update, or delete data in one table based on certain conditions in another table. With MERGE, you can include the logic for these data modifications in one statement.
How MERGE works
The MERGE statement basically works as separate insert, update, and delete statements all within the same statement. You specify a "Source" record set and a "Target" table, and the join between the two. You then specify the type of data modification that is to occur when the records between the two data are matched or are not matched. MERGE is very useful, especially when it comes to loading data warehouse tables, which can be very large and require specific actions to be taken when rows are or are not present.
I will simulate sales feeds being received in the database and loaded to a reporting table that records daily sales statistics. In a typical scenario, the records would be loaded into a staging table (SalesFeed in this example), and then a series of transformations or DDL statements would be executed on the reporting table (SalesArchive in this example) to update the daily sales data. The MERGE statement allows you to use one statement to update the SalesArchive table rather than use several different DDL statements, which potentially could reduce the time it takes to make the updates occur, since only one lookup is done on the data rather than several.
The following script creates the SalesArchive and SalesFeed tables:
CREATE TABLE SalesArchive
CustomerID INT PRIMARY KEY,
CreationDate DATETIME CONSTRAINT df_CreationDate DEFAULT(GETDATE()),
UpdatedDate DATETIME CONSTRAINT df_UpdatedDate DEFAULT(GETDATE())
CREATE TABLE SalesFeed
The script below loads some data into the SalesFeed table. The way in which I am inserting data into this table is new to SQL Server 2008; it allows you to specify many values to be inserted using the VALUES clause of the INSERT statement.
INSERT INTO SalesFeed
(CustomerID, Product, SaleAmount)
(4,'BigScreen', 880),(5,'Computer', 700)
I have a few rows of data in my SalesFeed table and no data in my SalesArchive table. Now it is time for me to create my MERGE statement to add data to this table. Below is the MERGE script.
MERGE SalesArchive AS SA
LoadDate = MIN(CONVERT(VARCHAR(8), GETDATE(), 112)),
TotalSalesAmount = SUM(SaleAmount),
TotalSalesCount = COUNT(*)
GROUP BY CustomerID
) AS SalesFeedCTE (CustomerID, LoadDate, TotalSalesAmount, TotalSalesCount)
SA.CustomerID = SalesFeedCTE.CustomerID AND SA.SalesDate = SalesFeedCTE.LoadDate
WHEN NOT MATCHED THEN
INSERT (CustomerID, SalesDate, TotalSalesAmount, TotalSalesCount, CreationDate, UpdatedDate)
VALUES( SalesFeedCTE.CustomerID, SalesFeedCTE.LoadDate, SalesFeedCTE.TotalSalesAmount, SalesFeedCTE.TotalSalesCount, GETDATE(), GETDATE())
WHEN MATCHED THEN
SET SA.TotalSalesAmount = SA.TotalSalesAmount + SalesFeedCTE.TotalSalesAmount,
SA.TotalSalesCount = SA.TotalSalesCount + SalesFeedCTE.TotalSalesCount,SA.UpdatedDate = GETDATE();
At first glance, it looks reasonably complicated, but it's not too bad once you get used to it. The table immediately following the MERGE statement is the table that will be modified; this is known as the TARGET table. In the USING statement, data from the SalesFeed table is being aggregated inside of a subquery based on the CustomerID; this portion is known as the SOURCE. This aggregation allows me to guarantee that there will be only one record per customer to update my SalesArchive table.
The ON clause of the MERGE statement is where I specify: the joining between the SOURCE, the aggregated data from the subquery, and the TARGET, the SalesArchive table.
The WHEN NOT MATCHED clause is where I specify what action I want to occur when the records from the SOURCE are not found in the TARGET. In this scenario, I want to insert those records into the SalesArchive table.
The WHEN MATCHED clause is where I specify what I need to occur when the records from the SalesArchive table and the subquery of the SalesFeed table are found. In this scenario, I want to update what is currently in the table for that day, such as the TotalSalesAmount, the TotalSalesCount, and the UpdatedDate.
With this scenario, if another sales feed comes into the database, only one statement will need to be run for that feed. Any new customer sales will be added to the database, and any existing sales will be updated with the new sales information.
Tim Chapman a SQL Server database administrator and consultant who works for a bank in Louisville, KY. Tim has more than eight years of IT experience, and he is a Microsoft certified Database Developer and Administrator. If you would like to contact Tim, please e-mail him at email@example.com.
————————————————————————————————————————————-Get SQL tips in your inbox
TechRepublic's free SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system. Automatically subscribe today!
Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.