Tuesday 20 May 2008

Reporting Services: Passing MultiValue Parameters

I recently came across an article by Wayne Sheffield on SQL Server Central which contained a very neat idea for passing multi-value parameters from SSRS to a SQL stored proc by using XML.

Because SQL stored procs can't handle arrays, it can't handle parameters with multiple values. There are a few ugly ways around this of course by using delimiters and manipulating strings but that just isn't pretty at all. Wayne's idea is to use XML string parameters.

So SSRS would send a string in the following format:


<root>
       <node>
              <element>element data</element>
       </node>
</root>


It would look something like this:


<customers>
       <customer>
              <customerid>1234</customerid>
       </customer>
</customers>

Wayne has written a bit of code which you can add to your report or create a DLL for which can then be referenced by your report.


Function ReturnXML(ByVal MultiValueList As Object, ByVal Root As String, ByVal Node As String
       **************************************************************************
       Returns an XML string by using the specified values.
        Parameters:MultiValueList - a multi value list from SSRS
        Root, Node, Element - String to use in building the XML string
        **************************************************************************
        Dim ReturnString = ""
        Dim sParamItem As Object
        ReturnString = "<" & Root & ">"
        For Each sParamItem In MultiValueList
               ReturnString &= "<" & Node & "><" & Element & ">" & Replace(Replace(sParamItem,"&","&"),"<", "<") & ""
       Next
       ReturnString &= ""
       Return (ReturnString)
End Function


This code would be referenced in your Reporting Services parameter like:

ReturnXML(Parameters!MultiValue.Value, "Customers", "Customer", "CustomerId")

To then use your XML parameter within the stored proc:

Select CustomerId, CustomerName, ActiveFlag
From tCustomer a
INNER JOIN @ipCustomerList.nodes('/Customers/Customer') AS x(item) ON a.CustomerId = x.item.value('CustomerId[1]', 'integer')


Pretty handy no?

Thursday 8 May 2008

Datamining Part II - Terminology

Datamining, like all other IT subjects has it's own lingo. This quick blog post will explain them.

Datamining
Datamining attempts to deduce knowledge by examining existing data

Case
A case is a unit of measure.
It equates to a single appearance of an entity. In relational terms that would mean one row in a table. A case includes all the information relating to an entity.

Variable
The attributes of a case.

Model
A model stores information about variables, the algorithms used and their parameters and extracted knowledge. A model can be descriptive or predictive - it's behaviour is driven by the algorithm which was used to derive it.

Structure
A structure stores models.

Algorithm
My definition here is from the perspective of datamining rather than a general definition. An algorithm is a method of mining data. Some methods are predictive (forecasting) and some are relative (showing relationships). 7 algorithms are included with SQL Server 2005.

Neural Network
An algorithm designed to predict in a non-linear fashion, like a human neuron. Often used to predict outcomes based on previous behaviour.

Decision Tree
An algorithm which provides tree-like output showing paths or rules to reach an end point or value.

Naive Bayes
An algorithm often used for classifying text documents, it shows probability based on independant data.

Clustering
An algorithm which groups cases based on similar characteristics. Often used to identify anomalies or outliers.

Association
An algorithm describes how often events have occured together. Defines an 'itemset' from a single transaction. Often used to detect cross-selling opportunities.

Sequence
An algorithm which is every similar to the association algorithm except that it also includes time.

Time Series
An algorithm used to forecast future values of a time series based on past values. Also known as Auto Regression Trees (ART).

Cluster
A cluster is a grouping of related data.

Discrete
This is more a statistical term than a strictly datamining term however it is used frequently - hence it's inclusion here. Discrete refers to values which are not sequential and have a finite set of values eg true/false

Continuous
Continuous data can have any value in an interval of real numbers. That is, the value does not have to be an integer. Continuous is the opposite of discrete.

Outlier
Data that falls well outside the statistical norms of other data. An outlier is data that should be closely examined.

Antecedent
When an association between two variables is defined, the first item (or left-hand side) is called the antecedent. For example, in the relationship "When a prospector buys a pick, he buys a shovel 14% of the time," "buys a pick" is the antecedent.

Leaf
A node at it's lowest level - it has no more splits.

Mean
The arithmetic average of a dataset

Median
The arithmetic middle value of a dataset

Standard Deviation
Measures the spread of the values in the data set around the median.

Skew
Measures the symmetry of the data set ie is it skewed in a particular direction on either side of the median

Kurtosis
Measures whether the data set has lots of peaks or is flat in relation to a normal distribution

Monday 5 May 2008

SQL Server Releases

Microsoft have announced that they are changing their approach to releases for SQL Server. This is interesting because SQL Server releases can be a touchy subject for businesses, particularly those with big server farms. Inevitably the development team wants the Service Pack to be installed ASAP whereas the server team is keen to protect their stable server and pretend service packs don't exist. This means a lot of pushing and shoving.

This new approach should help to alleviate the pressure a little but I'm not altogether convinced.

· Smaller Service Packs which will be easier to deploy
I suspect smaller service packs will make server teams less inclined to come to the party because less inclusions on a per service pack basis inherently implies more service packs.

· Higher quality of Service Pack releases due to reduced change introduced
It's all very well to say that the quality is better but that's a very airy fairy 'benefit' which I can't imagine will go down very effectively with server teams as an argument for implementation. It's just not very quantitative which means server teams are likely to ignore it.

· Predictable Service Pack scheduling to allow for better customer test scheduling and deployment planning.
On this point, I demure. This can have a huge impact on getting releases implemented. Presuming of course that you can get your server team to operate on a scheduled release process themselves. It's all very well for the vendor to do it but if the server team doesn't ALSO do it, there's no gain. That said, I believe that such a process SHOULD be followed. I just don't see it as terribly likely. I fervently hope to be disproven.

It's really easy to be cynical about this approach and say 'my organisation will never do this'. Which is the trap I've fallen into here I realise, but the fact of the matter is, good on Microsoft for considering these issues and attempting to find ways to improve them. The approach is right and a positive move. Now the onus is on us to follow in their footsteps. This should be a wakeup call to server and development teams to find more common ground, to develop processes which satisfy everyone's needs and to communicate with each other better.

For more details:
http://blogs.msdn.com/sqlreleaseservices/archive/2008/04/27/a-changed-approach-to-service-packs.aspx