Monday, 20 July 2009

SQL 2008: Using MERGE to do a Type 2 Change

I'm going to do a brief overview of the SQL 2008 MERGE statement. Primarily for my own purposes - being to document the usage for type 2 changes so I don't forget it.

First let's begin with the basic type 1 usage.


All fairly self-explanatory. We have a source and a target. We join the two tables, identify new records, changed records and deleted records. And deal with them appropriately.

So let's move onto the type 2 changes. Here we need to:

  • Insert New Records
  • Disable changed records
  • Add a new record for the changed records
That's 2 inserts and an update. The MERGE won't allow that so we have to nest our merge inside an insert. Luckily this works because MERGE allows output!


Ignore the outer INSERT for the moment. We have a MERGE statement which is very similar to our type 1 MERGE. Compare the Source and Target, INSERT new records as required. The difference being that when the matched record contains a change, instead of updating the changed field, we set the CurrentFlag and the EndDate so as to disable the record.


Of course that means we still don't have a record which contains our updated values. So, now lets turn our attention to the OUTPUT clause.

This will be familiar to anyone who used it in SQL 2005 in much the same context - to identify rows which have been modified/added/deleted. Really the origin of MERGE seems to be strongly rooted in the SQL 2005 OUTPUT clause. On the upside, unlike the SQL 2005 version and UPDATE is an UPDATE and not a DELETE and an INSERT. Small but handy! Anyway, where were we? Let's return the columns we need to create a new record and a special $Action column - this tells us what actually happened ie UPDATE, INSERT or DELETE. We need this because the OUTPUT will return everything that occured in the MERGE.


Now we have a set of rows that have been actioned by the MERGE. Still, we haven't actually done anything with them.

To do that, we enclose the entire MERGE statement, including it's OUTPUT clause of course, into brackets so we can make it the FROM clause for an INSERT. Our OUTPUT clause is return the columns we need for the INSERT remember? So now we just add our INSERT INTO at the front, add an alias to the closed bracket of the FROM clause AND, drumroll... whack a WHERE clause on the end. This where clause just filters the OUTPUT from the MERGE so we only insert new records for those that were updated, ignoring those that may have been inserted etc in the MERGE.

Easy!


-------------------------------------------------------------------------------------------------

For the purposes of copy/paste-ability, here is a text version of the TSQL. Images have been used above for improved readability.

--A simple type 1 change
MERGE INTO DimCustomer c
USING StageCustomer stg ON c.CustomerId = stg.CustomerId
WHEN NOT MATCHED BY TARGET THEN --ie new
INSERT (CustomerId,CustomerName,IsActive) VALUES (
stg.CustomerId
,stg.CustomerName
,stg.IsActive
)
WHEN MATCHED AND c.CustomerName <> stg.CustomerName THEN --ie changed
UPDATE SET c.CustomerName = stg.CustomerName
WHEN NOT MATCHED BY SOURCE THEN --ie deleted
DELETE


--a type 2 change
INSERT INTO dbo.DimCustomer
SELECT CustomerId, CustomerName, IsActive, IsCurrent, Eff_Date, End_Date
FROM
( MERGE DimCustomer c
USING StageCustomer stg ON c.CustomerId = stg.CustomerId
WHEN NOT MATCHED THEN INSERT VALUES (
stg.CustomerId
,stg.CustomerName
,stg.IsActive
,1 --> IsCurrent
,GETDATE() --> Eff_Date
,NULL --> End_Date
)
WHEN MATCHED AND c.IsCurrent = 1 AND (c.CustomerName <> stg.CustomerName) THEN UPDATE SET
c.IsCurrent = 0
,c.End_date = GETDATE()
OUTPUT $Action as RowAction
,stg.CustomerId
,stg.CustomerName
,1 AS IsCurrent
,GETDATE() as Eff_Date
,NULL as End_Date
) m
WHERE m.RowAction = 'UPDATE'

No comments: