Thursday 14 October 2010

SSIS Doesn't Play Nice with Special Characters

Oh SSIS you cantankerous old thing, you! I spent an hour today adding cleansing tasks into several data flows to replace special characters with other values ie "" to "-" and "[" to "(" etc. Only to have the content of those dataflows disappear when I closed then reopened the saved package. When I say disappear, I mean empty. Completely. Devoid of all steps. An empty shell.

Only dataflows which had the new special character cleansing step were empty - the rest were fine. This placed the blame squarely on the new steps. I should have twigged when I attempted to copy the step from one dataflow to another and got an error - something about HEX values being invalid. Stupid me, blindly continued creating them manually in each dataflow. Once done, I saved and closed. Then when I reopened, those dataflows were completely empty.

Fortunately being a concientous backer-upper-er-er, I had the previous version stored elsewhere. Sans special character cleansing of course.

So first, why do I need to do cleansing at all? These characters have passed through SSIS into the staging database with no problems. They have also passed through SSIS into the datamart database with no hiccups. Well the problem comes when I process the dimensions which use the datamart tables as their source. Then I get errors.

So, for whatever reason, cleansing ahoy!
But SSIS won't do it, as proven by my frustrating waste of an hour.
I tried using them directly in my source query with the same result. So I created udf_CleanseSpecialCharacters to handle it.


/**********************************************************************************************************************************************
HEADER
/***********************************************************************************************************************************************
FN Name: dbo.udf_CleanseSpecialCharacters
Author: Kristen Hodges
Date: 14 October 2010

-------------------------------------------------------------------------------------------------------------------------------------------
Purpose:
--------------------------------------------------------------------------------------------------------------------------------------------
Return a given string, cleansed of special characters

--------------------------------------------------------------------------------------------------------------------------------------------
OVERVIEW
--------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------
NOTES
--------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------
ASSUMPTIONS AND BUSINESS RULES
--------------------------------------------------------------------------------------------------------------------------------------------
It may be desirable in certain contexts to retain some characters whereas in others it is not
eg comma stored in table but not as a ssas dimension member unique key
The @iOptional_CharactersToCleanseList is an optional parameter which can be used to pass in a string of characters which should be stripped and
any others will be retained

The @iStripCharacters is a true/false flag which will strip the characters if true
or if false, the value will be replaced with particular values eg [ becomes -

--------------------------------------------------------------------------------------------------------------------------------------------
USAGE
--------------------------------------------------------------------------------------------------------------------------------------------
Called by:

Sample USage:
select dbo.udf_CleanseSpecialCharacters('John&kathy@somedomain.com.au',1,0,NULL)
select dbo.udf_CleanseSpecialCharacters('John&kathy@somedomain.com.au',1,1,NULL)
select dbo.udf_CleanseSpecialCharacters('John&kathy@somedomain.com.au',1,1,'&')

--------------------------------------------------------------------------------------------------------------------------------------------
CHANGE LOG
--------------------------------------------------------------------------------------------------------------------------------------------
AUTHOR DATE CHANGE
KHODGES 2010-10-14 Created

************************************************************************************************************************************************/

************************************************************************/

ALTER FUNCTION [dbo].[udf_CleanseSpecialCharacters](
@iStringToCleanse varchar(500)
,@iStripCharacters bit
,@iStripAllSpaces bit
,@iOptional_CharactersToCleanseList varchar(50)
)
RETURNS varchar(500) AS BEGIN

DECLARE @return varchar(500), @CurrentChar char(1), @CurrentReplacement varchar(10), @posn int

SET @return = @iStringToCleanse

set @CurrentChar = CHAR(5) --
if @iStripCharacters = 0 set @CurrentReplacement = '' else set @CurrentReplacement = '-'
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(58) -- :
if @iStripCharacters = 0 set @CurrentReplacement = '' else set @CurrentReplacement = '-'
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(59) -- ;
if @iStripCharacters = 0 set @CurrentReplacement = '' else set @CurrentReplacement = '-'
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(91) -- [
if @iStripCharacters = 0 set @CurrentReplacement = '' else set @CurrentReplacement = '-'
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(93) -- ]
set @CurrentReplacement = ''
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(37) -- %
set @CurrentReplacement = ''
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(38) -- &
if @iStripCharacters = 0 set @CurrentReplacement = '' else begin
set @CurrentReplacement = ' and '
IF @iStripAllSpaces = 1 set @CurrentReplacement = replace(@CurrentReplacement,' ','-')
end
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(124)-- |

if @iStripCharacters = 0 set @CurrentReplacement = '' else set @CurrentReplacement = '-'
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(44) -- ,
if @iStripCharacters = 0 set @CurrentReplacement = '' else set @CurrentReplacement = SPACE(1)
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(36) -- $
set @CurrentReplacement = ''
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(35) -- #
if @iStripCharacters = 0 set @CurrentReplacement = '' else set @CurrentReplacement = 'No.'
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(35) -- #
if @iStripCharacters = 0 set @CurrentReplacement = '' else set @CurrentReplacement = 'No.'
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(64) -- @
if @iStripCharacters = 0 set @CurrentReplacement = '' else begin
set @CurrentReplacement = ' at '
IF @iStripAllSpaces = 1 set @CurrentReplacement = replace(@CurrentReplacement,' ','-')
end
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)

set @CurrentChar = CHAR(46) -- .
if @iStripCharacters = 0 set @CurrentReplacement = '' else begin
set @CurrentReplacement = ' dot '
IF @iStripAllSpaces = 1 set @CurrentReplacement = replace(@CurrentReplacement,' ','-')
end
IF PATINDEX('%'+@CurrentChar+'%',@iOptional_CharactersToCleanseList) >= 1 OR @iOptional_CharactersToCleanseList IS NULL
SET @return = REPLACE(@return,@CurrentChar,@CurrentReplacement)


IF @iStripAllSpaces = 1 SET @return = REPLACE(@return,' ','')

RETURN @return
END

No comments: