Monday, 18 February 2008

SSIS Synchronous/Asynchronous Script Transformations in a Data Flow

Asynchronous just means it consumes all rows before progressing to the next step in the data flow – unlike most other transformations where the entire data flow is executed for each row before the next row is commenced. Script transformations can be synchronous (eg Lookup) or asynchronous (eg Sort or Aggregate).

A synchronous script transformation is identified because the output is linked to a specified input. This is done in the Inputs and Outputs area of the task by entering the ID from the Input properties in the SynchronousInputID field of the Output properties. For each row of input, there will be one row of output.

An asynchronous script transformation is identified firstly because the SynchronousInputID field of the Output properties contains "None" or "0". When a new row is to be added, it is explicitly done using:

Output0Buffer.AddRow()

In order to aggregate data, flatten rows or concatenate data from different rows, you simply set a flag to identify when a new row should be added. To do this you need to declare a boolean in the class but outside the main sub:

Public Class ScriptMain

Inherits UserComponent

Private blnFirstRow As Boolean = True
Dim opMessage1 As String = ""

Public Overrides
Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

In this example, I'm taking all rows and flattenning them into one row so the first step in my sub is:

If blnFirstRow = True Then Output0Buffer.AddRow()
blnFirstRow = False
opMessage1 = opMessage1 & " " & Row.ipMessage.ToString

Note that I then concatenate a variable with a new value… Alternately, you could set a flag so that a new row is added when certain conditions are met such as the value of the ipCustomerNo changes.

One other thing to add - you need to use the PostExecute() sub to write to variables.

6 comments:

Anonymous said...

I'm having an issue and you seem to be the indidvidual to help me out here.

Im currently faced with a project where the FlatFileSource - File name and path can be different on a day to day bassis. But the format of these files never changes.

Im currently using a VB 6.0 app to call the DTS.

Is there a way to pass a fliepath and name programmatically from VB 6.0 to the FlatFileSource?

Kristen Hodges said...

Absolutely.

Now it's been a few years since I used SQL 2000 and I no longer have it installed so I'm going from memory which means the wording of some instructions may not be 100% accurate however the process is correct.

So what do you do?

(1) In your DTS package create a global variable to hold the name of the file.

(2) Create a dynamic property task which sets the path value of the connection

(3) in your VB app:

* instantiate a new instance of DTS package

* load the DTS package using the LoadFromSQLServer method

* set the value of the global variable using the GlobalVariables collection - referencing it by the name property and setting the value property of the item.

(4) execute the package

Another alternative method is as as per the article - http://database.ittoolbox.com/groups/technical-functional/sql-server-l/dts-and-using-parameters-107631

...however, I think this is a finicky way of doing it that puts too much onus on the VB app and makes it difficult to change the DTS package down the track but really either method will achieve what you want it to.

Kristen Hodges said...

Oh I presumed you mean SQL 2000 because you referred to DTS (as opposed to SSIS). If my assumption is in fact wrong let me know.

jamiet said...

"Asynchronous just means it consumes all rows before progressing to the next task in the data flow "

Not true. There are loads of different ways of describing asynchronous components but that is not one of them. The Merge Join is asynchronous - does it consume all rows before progressing to the next "task"? No.

"Task" is the wrong nomenclature too. Tasks are in the control-flow. Components (or transformations) are in the dataflow.

Kristen said...

fair point re task v transformation v step. clearly I've been inconsistent about my usage in this post and used task in one case where I should have used transformation.

To clarify, I believe asynchronous is defined this way, because we are discussing it in the context of the script transformation. Not my choice of terms, but that's what the documentation refers to so that's what I've also used. for example: synchronous (http://msdn.microsoft.com/en-us/library/ms136114(v=SQL.90).aspx)
and asychronous (http://msdn.microsoft.com/en-us/library/ms136133(v=SQL.90).aspx)

Richard said...

Great blog, Kristen! I'd love to read more about the advantages / disadvantages of using either synch or asynch script transformations - and when to use one over the other.

Then, if it's not too much to ask - I've been looking for sample code for each synch and asynch. I've found plenty of sample code, including on the MS website, for asynch and synch but they are not complete. When I copy them, they do not output more than one row. I want to see a simple script transformation that transforms one or two columns and outputs it.