SP3 for SQL 2005 is now available for download.
Included are a number of Bug Fixes , performance enhancements for using SSRS in SharePoint Integration mode, improved rendering of PDF fonts in SSRS, support for SQL 2008 Dbs with SQL 2005 Notification Services and support for Teradata Dbs as a data source for Report Models.
This Service Pack is cumulative and can be run on any version of SQL 2005.
Monday, 22 December 2008
SQL 2005 SP3 has landed
Posted by Kristen Hodges at 11:02 am 0 comments
Labels: Bug, Reporting Services (SSRS), Server Administration, Sharepoint 2007 (MOSS), SQL
Wednesday, 17 December 2008
PPS-P: Where, oh where, does my data go?
Firstly, to answer the why do we care question, ask yourself this...
- When do your extracts from PPS run (presuming you run it more regularly than say once overnight)?
- Do users eg approvers expect to see updates immediately - particularly at certain times of the cycle ie month end?
- What data do reviewers, approvers and report consumers expect to see? Approved? Currency Converted?
(1) The user selects the Save Privately action
- The data is stored on the local file system - known as the Offline Cache. This location is usually C:\Documents and Settings\%username%\Local Settings\Application Data\Microsoft\PerformancePoint\OfflineCache
- In the Offline Cache, the Catalog.xml file stores a list of all assignments stored locally
- In the Offline Cache, each assignment is stored in a .dat file. The dat file an xml file which contains binaries.
- Data is NOT sent to the SQL server or the cube
- The status of the assignment is still 'started'
- The data is stored in the SQL table called MG_ModelName_Default_Partition
- The resulting values from Definition Rules are stored in the MG table also however they're linked to the 'No Assignment' assigment
- Once the pre-configured polling period has passed, the cube called
ApplicationLabel_ModelSiteLabel will be refreshed - The status of the assignment is 'Partial'
- The data is stored in the SQL table called MG_ModelName
_Default_Partition - The resulting values from Definition Rules are stored in the MG table also however they are linked to the 'No Assignment' assigment
- Once the pre-configured polling period has passed, the cube called ApplicationLabel
_ModelSiteLabel will be refreshed - The status of the assignment is 'Submitted'
(4) A data movement job occurs
- The data is copied from the source SQL table called MG_ModelName_Default_Partition to the destination table called MG_ModelName_Default_Partition
- Data is aggregated when the scope of a dimension (in the data association) has been set to "All" on the source model.
- All data is copied: Partial, Submitted, Approved
- Send it to the server (Status = Pending)
Depends on the size of the submission (How many rows? Is the workbook attached? How many dimensions?) and network bandwidth. - Process the Queue (Status = WaitProcess)
Depends on the Planning Process Poll Interval (recommendation is 1 minute) and how many items are in the queue and of course, how long it takes to validate the submission. - Cube Processes
Depends on the OLAP Cube Refresh Interval property in the Workflow tab of PAC.
Posted by Kristen Hodges at 3:22 pm 0 comments
Labels: PerformancePoint, Planning
Weighing in on Project Gemini
My reception to Project Gemini to date has been lukewarm. Sure, self-service cubes in Excel sounds sexy and looks sexy... but... we don't let our users have free reign with data for a reason. These controls didn't just happen because we want to keep ourselves in a job (I hope!).
I'm gonna take a stab and suggest that every BI pro has walked into a job and found an environment where there are 2000 reports on a single system and some of them consume all the machine resources and take 10 minutes to execute. That sort of malarky. So we did the hard slog, got the BA's back up and whipped everyone back into line. The users were happier because the reports execute quickly and were easy to find, you were happier because there are far fewer reports to maintain, the server team are happier because the box didn't have to be rebooted every day and the BA's are happier because everyone else is happy (one hopes!).
This is why we put controls in place.
So. Doesn't this leave Project Gemini dead in the water in the real world?
Well, I read an interesting perspective from Patrick Husting which made me re-think my thoughts (non-pun intended).
"People are always going to do self service reporting in some way whether we like it or not, no matter how good the underlying data is, so why not do it in a controlled manner where everything is audited and logged and the IT team has full visibility of what is going"
Ok so, to quote from a really crappy movie I saw recently "just because it is, doesn't mean it should be". But I think he might actually be right. To a point.
I really do hate to be a naysayer. And I'm not just saying that. But, I have a big BUT (please no jokes!).
Where is all this magical data that is being analysed coming from in the first place? From systems. Well guess, what, we don't let users have hollus bollus access to our databases to avoid the suituations caused by our predecessors.
Actually that's not my only issue... users do NOT have a big picture view of the world. They don't care what other parts of the business are doing and they are not interested in how data aggregates upwards beyond their little fish pond. They are not interested in whether there is a single source of truth or if their version of the truth is different to someone else's. They are not interested in network bandwidth or performance at other sites.
Nor should they care about this stuff, that's not their job. It's ours.
It's all very well to say "everything is audited and logged" but aren't we supposed to operate preventatively not reactively?
So I have to ask, isn't that what something like ProClarity is for? To allow users to slice and dice data however they choose while ensuring that the structures and models have been put in place to consider the big picture?
You know what, I think I've just talked myself back to my original stance. It's sexy and I like it but I can't envision practical applications in the real world yet.
That said, I'm always happy to be proven wrong.
Posted by Kristen Hodges at 9:26 am 0 comments
Labels: Cubes, Excel, ProClarity, Project Gemini
Tuesday, 18 November 2008
PerformancePoint Planning - Enable/Disable Spreading
Contrary to the incorrect instructions on Office Online, to enable and disable spreading on a matrix you must do the following:
From the Advanced group on the PPS ribbon, you select Options and the SpreadingType drop-down allows you to enable/disable spreading.
Posted by Kristen Hodges at 3:21 pm 1 comments
Labels: PerformancePoint, Planning
Tuesday, 11 November 2008
Using SAP as a Data Source
I was recently asked a question which I thought might be of value to everyone.
Can you please let me know...
1. Is share Point and Performance Point can be used in SAP BI or is it strictly Integrated with MircoSoft Technologies!!
So my response was...
I’m assuming you mean “can you use SAP as a data source?”
Firstly, SharePoint isn’t a BI technology as such. It’s a collaboration, portal & document management tool. However it does have the ability to create lists and simple KPIs which you could use SAP data as a source for if you wanted to.
That said, I’m sure your SAP operations team would never allow you to link directly to SAP – you would need to extract into a database first.
Secondly, you can certainly use SAP as a data source for both PerformancePoint modules - Planning and Monitoring & Analyzing. However you would never link directly to the system from PerformancePoint. You would almost always extract from your transactional system into a datawarehouse and that would be your data source (there may be exceptions from time to time but generally it's is wise to separate your reporting engine and your transactional engine. There are numerous gains in doing so - security, performance, aggregation, modelling).
The other key consideration is licensing. SAP is pretty strict about licensing when it comes to extracting data so you would need to take that into consideration also.
Hope that answers your question!
Posted by Kristen Hodges at 10:00 am 0 comments
Labels: PerformancePoint, SAP, Sharepoint 2007 (MOSS)
Wednesday, 5 November 2008
PerformancePoint Planning - Impersonate a Contributor during Form Development
It was be absolutely invaluable to have the ability to impersonate a 'contributor' during form development.
Currently, when developing a form, I have 2 sesssions of Excel open - one as me and one as a contributor. This means I need to create a cycle and an assignment also and republish every time I make a change to the form I'm working on. Merely to validate that I am getting 'yellow cells' as expected (which is by no means a sure thang!).
A feature, similar to Analysis Services where I can impersonate a particular user/role would eliminate the need for all that extra effort - republishing, regenerating etc.
If you too would value this functionality, please vote for it at https://connect.microsoft.com/feedback/ViewFeedback.aspx?FeedbackID=380070&SiteID=181
Posted by Kristen Hodges at 2:55 pm 0 comments
Labels: PerformancePoint, Planning
PerformancePoint Planning - Object Names
Just a quick FYI....
In PerformancePoint Planning, objects are named as follows:
- Cubes: ApplicationLabel_ModelSiteLabel
- Databases: As specified in PAC during create
- Application Label must be unique and cannot be changed
- When connecting to an application, the Application Name is displayed
Posted by Kristen Hodges at 1:32 pm 0 comments
Labels: PerformancePoint, Planning
Tuesday, 4 November 2008
Microsoft Azure.... the blue loo
Microsoft announced it's Azure service last week. In their words Azure is a tool that allows you to "Build new applications in the cloud - or use interoperable services that run on Microsoft infrastructure to extend and enhance your existing applications". From our, the BI developers', perspective Azure is hosting not just data but also Visual Studio solutions ie developing and deployment within the cloud.
Not that it's limited to Microsoft tools and environments of course.
So I immediately joined the CTP for SQL Services. I've downloaded the SQL Data Services SDK and have been placed on a waitlist for access to the cloud.
From what I can tell so far, you can use REST or SOAP for CRUD operations and LINQ for queries. Interestingly, there is no 'schema' as such - new properties can be added to entities as required - no need to add a new 'column'. What that means in practical terms I'm yet to find out. I suppose it will become apparent once I can access the cloud.
More info will follow shortly.
Posted by Kristen Hodges at 9:01 am 0 comments
Labels: Microsoft Azure, SQL
Tuesday, 28 October 2008
What’s New in SQL 2008 – Part 3 – Table Type
The new Table type allows you to save a table definition in the database which can be reused later to define table variables, stored procedure parameters and functions. To create a Type, we use the CREATE TYPE clause, for example: Examples Create Table-Type Variables DECLARE @t as dbo.ProductList; Creating a Stored Procedure with a Table-typed Parameter CREATE PROC dbo.uspGetOrdersforProduct ( Passing Table-parameters into Stored Procedures DECLARE @t as dbo.ProductList; EXEC dbo.uspGetOrdersforProduct @tCREATE TYPE dbo.ProductList as TABLE (
ProductName varchar(200) NULL,
ProductCategory varchar(200) NOT NULL)
Important Notes:
The metadata is visible using sys.table_types
Variables/parameters cannot be set to NULL – if no table is supplied, an empty table is generated
Not affected by transaction rollbacks
Do not cause recompilations
INSERT INTO @t VALUES
('Grey Slate Paver 10x10','Paver'),
('Blue Slate Paver 10x10','Paver'),
('Heritage Blue 12x6','Brick')
@t as dbo.ProductList READONLY,
@StartDate as Date,
@EndDate as Date
) AS
SELECT o.OrderId,
o.OrderDate,
o.CustomerId,
o.ProductId,
o.Amount,
o.Qty
FROM tOrder as o
INNER JOIN tProduct t on t.ProductId = o.ProductId
INNER JOIN @t as f on f.ProductCategory = t.ProductCategory and (t.ProductName is NULL or t.ProductName = f.ProductName)
WHERE o.OrderDate between @StartDate and @EndDate
INSERT INTO @t VALUES
(NULL,'Paver'),
(NULL,'Brick')
Posted by Kristen Hodges at 3:01 pm 0 comments
Labels: SQL
What’s New in SQL 2008 – Part 2 – Data Types
Date and Time Data Type Size (bytes) Example Notes Client Providers DATE 3 '2009-10-01' OLEDB: DBTYPE_DBDATE TIME 3-5 '13:12:00.1234567' Accurate to 100 nanoseconds OLEDB: DBTYPE_DBTIME DATETIME2 6-8 '2009-10-01 13:12:00.1234567' Accurate to 100 nanoseconds OLEDB: DBTYPE_DBTIMESTAMP DATETIMEOFFSET 8-10 '2009-10-01 13:12:00.1234567+02:00' Accurate to 100 nanoseconds OLEDB: DBTYPE_DBTIMESTAMP There are a number of new functions which can also be used with these new datatypes including: Other Data Type Size (bytes) Useage Returns Notes HIERARCHYID VARBINARY DECLARE @h hierarchyid = hierarchyid::GetRoot() DECLARE @c hierarchyid '/1/' Represents position in a hierarchy Actually a CLR function with several methods eg
ADO.Net: DateTime
ODBC: SQL_DATE
ADO.Net: TimeSpan
ODBC: SQL_TIME
Can specify precision ie decimal places eg DATETIME(3)
ADO.Net: DateTime
ODBC: SQL_TIMESTAMP
ADO.Net: DateTime
ODBC: SQL_TIMESTAMP
eg 5 bytes for 100k records at 6-LevelsSET @c = @h.GetDescendant(NULL, NULL)
SELECT @c.ToString()
Posted by Kristen Hodges at 2:52 pm 0 comments
Labels: SQL
What's New in SQL 2008 - Part 1 - Functionality
Function Usage Example Notes Declare & Initialise DECLARE @iValue as integer = 100; Compound Assignments SET @iValue -= 50 += -= *= /= %= Multi-Row Inserts Using VALUES INSERT INTO dimProduct (Label, BusinessKey,LastUpdatedBy,LastUpdatedDate) VALUES The insert is executed as a single transaction. MERGE MERGE INTO dbo.dimProduct as d INSERT (ProductName,ProductCategory) VALUES (s.ProductName, s.ProductCategory) WHEN NOT MATCHED BY SOURCE THEN DELETE Grouping Sets SELECT Year(OrderDate) as OrderYear, ()); () empty brackets represents an 'ALL' grouping Filtered Indexes CREATE NONCLUSTERED INDEX idxOrders_Product Change Data Capture Tracks data changes into a table.
UPDATE dimCustomer SET Balance += 200
('Grey Slate Paver 10x10',1111,'dbo','20091001 13:12:00'),
('Blue Slate Paver 10x10',1112,'dbo','20091001 13:12:00'),
('Black Slate Paver 10x10',1113,'dbo','20091001 13:12:00')
Interestingly, it can also be used like this:
SELECT * FROM (VALUES
('Grey Slate Paver 10x10',1111,'dbo','20091001 13:12:00')
('Blue Slate Paver 10x10',1112,'dbo','20091001 13:12:00')
('Black Slate Paver 10x10',1113,'dbo','20091001 13:12:00')
) as p (Label, BusinessKey,LastUpdatedBy,LastUpdatedDate)
USING dbo.dimProductStage as s ON d.ProductKey = s.ProductId
WHEN MATCHED THEN
UPDATE SET
d.ProductName = s.ProductName,
d.ProductCategory = s.ProductCategory
WHEN NOT MATCHED THEN
OUTPUT $action,
deleted.ProductId as DeletedId,
inserted.ProductId as InsertedId;
SalespersonId,
OrderType,
count(OrderId) as OrderCount,
sum(Amount) as OrderAmount
FROM factOrders
GROUP BY GROUPING SETS (
(OrderType, Year(OrderDate)),
(SalespersonId,OrderType, Year(OrderDate)),
(SalespersonId, Year(OrderDate)),
(Year(OrderDate)),
ON factOrders(OrderId,ProductId)
WHERE CreditFlag IS NULL
** to be discussed in further detail in an upcoming post **
Enabled using sys.sp_cdc_enable_db or sys.sp_cdc_enable_table
Disabled using sys.sp_cdc_disable_db or sys.sp_cdc_disable_table
Uses SQL Agent
Posted by Kristen Hodges at 2:50 pm 0 comments
Labels: SQL
SharePoint Magazine - Part 2 available now
My next article for SharePoint Magazine is available now.
http://sharepointmagazine.net/technical/administration/sell-sell-sell-why-build-a-dashboard-anyway-part-2-of-6
Posted by Kristen Hodges at 11:12 am 2 comments
Labels: Monitoring and Analysing, PerformancePoint, Sharepoint 2007 (MOSS)
Tuesday, 14 October 2008
PerformancePoint Planning: Error When Deleting Members of a Member Set
Unfortunately I discovered a problem in PPS Planning today which caused me a few headaches... thought I would pass it on to you!! The issue occurs when editing/creating a dimension member set... a save will fail if you do an outdent and a delete member immediately afterwards. Let me demonstrate. Let's presume you have a structure something like this: --- Expenses But you want to remove the Salaries level and end up with a structure that looks like this: --- Expenses If you do this: You will get an error when you save similar to the following: In order for it to work correctly, you MUST (in this order): I have logged this problem on Microsoft Connect. Please feel free to vote! https://connect.microsoft.com/feedback/ViewFeedback.aspx?FeedbackID=375153&SiteID=181
--- --- Salaries & Related
--- --- --- Salaries
--- --- --- --- Base Salary
--- --- --- --- Bonuses
--- --- --- --- Allowances
--- --- --- Tax Expense
--- --- Salaries & Related
--- --- --- Base Salary
--- --- --- Bonuses
--- --- --- Allowances
--- --- Tax Expense
Posted by Kristen Hodges at 12:14 pm 0 comments
Labels: Bug, PerformancePoint, Planning
Thursday, 18 September 2008
SharePoint Magazine - Part 1 Out Now!
I've commenced a 6-part series on building PerformancePoint dashboards in SharePoint which you can read at SharePoint Magazine.
I'll be releasing a new part each week for the next 6 weeks covering the various aspects of building dashboards - including why bother at all!
Have a read if you're interested. Feedback so far has been excellent.
Posted by Kristen Hodges at 10:07 am 0 comments
Labels: PerformancePoint, Sharepoint 2007 (MOSS)
Monday, 25 August 2008
Interesting Read
Great article comparing the PIVOT (SQL 2005+) function with doing it the old-fashioned way! Take a read...
http://www.sqlservercentral.com/articles/T-SQL/63681/
Posted by Kristen Hodges at 11:57 am 2 comments
Labels: SQL
Thursday, 3 July 2008
SQL: Concatenating Rows into a Scalar Value
This is an issue which pops up from time to time, often when passing values back to .Net applications.
The Problem
Taking values in rows and creating a single string value containing a delimited list of all those values.
The Old Solution
The method I have used in the past is to use the FOR XML clause and build a string that way. It works.
DECLARE @DepartmentName VARCHAR(1000) |
The Sexier Solution
This idea comes courtesy of Ken Simmons at SQL Tips... so simple and I'm kind of annoyed that I've never tried it myself!
DECLARE @DepartmentName VARCHAR(1000) |
What Were the Actual Results from Testing?
Sadly, the sexier solution under-performs in comparison to the FOR XML solution.
Method | Milliseconds | BytesReceived | Rows Returned |
FOR XML | 78 | 9891 | 3 |
COALESCE | 265 | 10505 | 60921 |
Obviously this lag is due to the number of rows which are returned to the client. Based on the stats, the FOR XML does all it's aggregation and concatenation server-side whereas the COALESCE method does it client-side.
So I guess that means the old way is still the best. Or is it? I wonder in what other circumstances COALESCE could be used? Ok so it under-performed in this instance but I suspect there are other uses which could be of great value. Got any ideas?
Posted by Kristen Hodges at 4:36 pm 0 comments
Labels: SQL
Tuesday, 20 May 2008
Reporting Services: Passing MultiValue Parameters
I recently came across an article by Wayne Sheffield on SQL Server Central which contained a very neat idea for passing multi-value parameters from SSRS to a SQL stored proc by using XML.
Because SQL stored procs can't handle arrays, it can't handle parameters with multiple values. There are a few ugly ways around this of course by using delimiters and manipulating strings but that just isn't pretty at all. Wayne's idea is to use XML string parameters.
So SSRS would send a string in the following format:
<root>
<node>
<element>element data</element>
</node>
</root>
It would look something like this:
<customers>
<customer>
<customerid>1234</customerid>
</customer>
</customers>
Wayne has written a bit of code which you can add to your report or create a DLL for which can then be referenced by your report.
Function ReturnXML(ByVal MultiValueList As Object, ByVal Root As String, ByVal Node As String
**************************************************************************
Returns an XML string by using the specified values.
Parameters:MultiValueList - a multi value list from SSRS
Root, Node, Element - String to use in building the XML string
**************************************************************************
Dim ReturnString = ""
Dim sParamItem As Object
ReturnString = "<" & Root & ">"
For Each sParamItem In MultiValueList
ReturnString &= "<" & Node & "><" & Element & ">" & Replace(Replace(sParamItem,"&","&"),"<", "<") & "" & Element & ">" & Node & ">"
Next
ReturnString &= "" & Root & ">"
Return (ReturnString)
End Function
This code would be referenced in your Reporting Services parameter like:
ReturnXML(Parameters!MultiValue.Value, "Customers", "Customer", "CustomerId")
To then use your XML parameter within the stored proc:
Select CustomerId, CustomerName, ActiveFlag
From tCustomer a
INNER JOIN @ipCustomerList.nodes('/Customers/Customer') AS x(item) ON a.CustomerId = x.item.value('CustomerId[1]', 'integer')
Pretty handy no?
Posted by Kristen Hodges at 10:49 am 3 comments
Labels: Reporting Services (SSRS), SQL
Thursday, 8 May 2008
Datamining Part II - Terminology
Datamining, like all other IT subjects has it's own lingo. This quick blog post will explain them.
Datamining
Datamining attempts to deduce knowledge by examining existing data
Case
A case is a unit of measure.
It equates to a single appearance of an entity. In relational terms that would mean one row in a table. A case includes all the information relating to an entity.
Variable
The attributes of a case.
Model
A model stores information about variables, the algorithms used and their parameters and extracted knowledge. A model can be descriptive or predictive - it's behaviour is driven by the algorithm which was used to derive it.
Structure
A structure stores models.
Algorithm
My definition here is from the perspective of datamining rather than a general definition. An algorithm is a method of mining data. Some methods are predictive (forecasting) and some are relative (showing relationships). 7 algorithms are included with SQL Server 2005.
Neural Network
An algorithm designed to predict in a non-linear fashion, like a human neuron. Often used to predict outcomes based on previous behaviour.
Decision Tree
An algorithm which provides tree-like output showing paths or rules to reach an end point or value.
Naive Bayes
An algorithm often used for classifying text documents, it shows probability based on independant data.
Clustering
An algorithm which groups cases based on similar characteristics. Often used to identify anomalies or outliers.
Association
An algorithm describes how often events have occured together. Defines an 'itemset' from a single transaction. Often used to detect cross-selling opportunities.
Sequence
An algorithm which is every similar to the association algorithm except that it also includes time.
Time Series
An algorithm used to forecast future values of a time series based on past values. Also known as Auto Regression Trees (ART).
Cluster
A cluster is a grouping of related data.
Discrete
This is more a statistical term than a strictly datamining term however it is used frequently - hence it's inclusion here. Discrete refers to values which are not sequential and have a finite set of values eg true/false
Continuous
Continuous data can have any value in an interval of real numbers. That is, the value does not have to be an integer. Continuous is the opposite of discrete.
Outlier
Data that falls well outside the statistical norms of other data. An outlier is data that should be closely examined.
Antecedent
When an association between two variables is defined, the first item (or left-hand side) is called the antecedent. For example, in the relationship "When a prospector buys a pick, he buys a shovel 14% of the time," "buys a pick" is the antecedent.
Leaf
A node at it's lowest level - it has no more splits.
Mean
The arithmetic average of a dataset
Median
The arithmetic middle value of a dataset
Standard Deviation
Measures the spread of the values in the data set around the median.
Skew
Measures the symmetry of the data set ie is it skewed in a particular direction on either side of the median
Kurtosis
Measures whether the data set has lots of peaks or is flat in relation to a normal distribution
Posted by Kristen Hodges at 10:42 am 0 comments
Labels: Data-Mining, SQL
Monday, 5 May 2008
SQL Server Releases
Microsoft have announced that they are changing their approach to releases for SQL Server. This is interesting because SQL Server releases can be a touchy subject for businesses, particularly those with big server farms. Inevitably the development team wants the Service Pack to be installed ASAP whereas the server team is keen to protect their stable server and pretend service packs don't exist. This means a lot of pushing and shoving.
This new approach should help to alleviate the pressure a little but I'm not altogether convinced.
· Smaller Service Packs which will be easier to deploy
I suspect smaller service packs will make server teams less inclined to come to the party because less inclusions on a per service pack basis inherently implies more service packs.
· Higher quality of Service Pack releases due to reduced change introduced
It's all very well to say that the quality is better but that's a very airy fairy 'benefit' which I can't imagine will go down very effectively with server teams as an argument for implementation. It's just not very quantitative which means server teams are likely to ignore it.
· Predictable Service Pack scheduling to allow for better customer test scheduling and deployment planning.
On this point, I demure. This can have a huge impact on getting releases implemented. Presuming of course that you can get your server team to operate on a scheduled release process themselves. It's all very well for the vendor to do it but if the server team doesn't ALSO do it, there's no gain. That said, I believe that such a process SHOULD be followed. I just don't see it as terribly likely. I fervently hope to be disproven.
It's really easy to be cynical about this approach and say 'my organisation will never do this'. Which is the trap I've fallen into here I realise, but the fact of the matter is, good on Microsoft for considering these issues and attempting to find ways to improve them. The approach is right and a positive move. Now the onus is on us to follow in their footsteps. This should be a wakeup call to server and development teams to find more common ground, to develop processes which satisfy everyone's needs and to communicate with each other better.
For more details:
http://blogs.msdn.com/sqlreleaseservices/archive/2008/04/27/a-changed-approach-to-service-packs.aspx
Posted by Kristen Hodges at 9:04 am 0 comments
Labels: Server Administration, SQL
Tuesday, 15 April 2008
Datamining – Reviewing the Data Source
I'm currently going through the Training Kit for Exam 40-445 – the MCTS SQL BI exam. And data-mining, joy of joys, an area where I'm not exactly at expert level, is the focus of my study. And of course, by all reports, data-mining makes up a large chunk of the exam. In the book, they provide sample code for the 4 moments of a particular attribute which can be used to validate test vs training data in reviewing source data for a data mining project. The Four Moments Mean – the centre point of the data set Datasets In reviewing source data, you need a training data set (to teach the patterns) and a test data set to validate whether that which was learnt is in fact correct! In the code below, we are returned 2 rows – a row for each dataset - showing the 4 moments for both. If the values are fairly similar in both rows then we know the sampling is valid. Sample Code This example uses, from the AdventureWorksDW database:
Moments are individual measures or parameters of the moment method. The moment method estimates and equates samples of data. The moments most commonly used in data mining are:
Standard Deviation – measures the spread of the values in the data set around the centre point
Skew – measures the symmetry of the data set ie is it skewed in a particular direction on either side of the centre point
Kurtosis – measures whether the data set has lots of peaks or is flat in relation to a normal distribution
--populate testing data
Select * into TK445_Ch09_TestSet
from vTargetMail
where rand(checksum(newid())%1000000000+CustomerKey) <.2
-- populate training data
Select * into TK445_Ch09_TrainingSet
from vTargetMail
where rand(checksum(newid())%1000000000+CustomerKey) <.2
except select * from TK445_Ch09_TestSet
--can be used to validate that datasets have been populated
--select 1,* from TK445_Ch09_TrainingSet union all select 2,* from TK445_Ch09_TestSet
SELECT
Mean = MIN(m.mean),
StdDev = Min(m.StdDev),
Skew = SUM(((Age*1.0-m.mean)/m.StdDev)*((Age*1.0-m.mean)/m.StdDev)*((Age*1.0-m.mean)/m.StdDev))*MIN(m.corrfact1),
Kurt = sum(square(square(((Age*1.0-m.mean)/m.StdDev)))) * MIN(m.corrfact2) - min(m.subfact)
FROM TK445_Ch09_TrainingSet v
CROSS JOIN (SELECT
mean = AVG(age*1.0),
stddev = STDEV(Age),
corrfact1 = count(*)*1.0 / (Count(*)-1) / (count(*)-2),
corrfact2 = count(*)*1.0 * (Count(*)+1) / (count(*)-1) / COUNT(*)-2 / COUNT(*)-3,
subfact = 3.0 * SQUARE((COUNT(*)-1)) / (COUNT(*)-2) / (COUNT(*)-3)
FROM TK445_Ch09_TrainingSet) m
UNION
SELECT
Mean = MIN(m.mean),
StdDev = Min(m.StdDev),
Skew = SUM(((Age*1.0-m.mean)/m.StdDev)*((Age*1.0-m.mean)/m.StdDev)*((Age*1.0-m.mean)/m.StdDev))*MIN(m.corrfact1),
Kurt = sum(square(square(((Age*1.0-m.mean)/m.StdDev)))) * MIN(m.corrfact2) - min(m.subfact)
FROM TK445_Ch09_TestSet v
CROSS JOIN (SELECT
mean = AVG(age*1.0),
stddev = STDEV(Age),
corrfact1 = count(*)*1.0 / (Count(*)-1) / (count(*)-2),
corrfact2 = count(*)*1.0 * (Count(*)+1) / (count(*)-1) / COUNT(*)-2 / COUNT(*)-3,
subfact = 3.0 * SQUARE((COUNT(*)-1)) / (COUNT(*)-2) / (COUNT(*)-3)
FROM TK445_Ch09_TrainingSet) m
Posted by Kristen Hodges at 9:36 am 1 comments
Labels: Certification, Data-Mining, SQL
Thursday, 13 March 2008
Microsoft 70-556 Exam - MCTS Performance Point
Last week I completed and successfully passed the 70-556 exam - which gives me an MCTS PerformancePoint certification.
The exam was relatively simple - I scored highly in all areas but one. Obviously a weak point there on my part... my success in the other areas was due to a week and a bit of study using the Rational Guides to PerformancePoint (authored by my esteemed colleagues Adrian Downes and Nick Barclay) - the Planning volume and the Monitoring & Analyzing Volume. Much thanks to them!
The exam is strongly skewed towards Planning - only a few questions on Monitoring & Analysing - and the M&A questions were very simple. There are quite a few questions on Business Rules and Configuring Models & Dimensions - and these are complex areas so spend time on them in your study.
If you are doing the exam and using the Rational Guide books, don't ignore the Bonus chapters - you'll need them!
Next up is the 70-445 (MCTS SQL BI) which is much harder... I'm currently spending a lot of time with the Microsoft Training kit. Wish me luck...
Posted by Kristen Hodges at 6:07 pm 0 comments
Labels: Certification, PerformancePoint
Thursday, 21 February 2008
Using TSQL Queries for AD Authentication
Yes you can directly query AD using TSQL queries. And it's REALLY easy. Why you ask – well, LDAP can be painfully slow if you have to trawl through groups eg if you are storing the results in a table or outputting them to screen or suchlike. I've found this method to be pretty reliable and reasonable with regards to performance. First things first… on a SQL server, you need to create a linked server to AD as follows: EXEC master.dbo.sp_addlinkedserver @server = N'ADSI', @srvproduct=N'Active Directory Service Interfaces', @provider=N'ADSDSOObject', @datasrc=N'adsdatasource' That's all the configuration you have to do. You are good to go! Below is an example of the TSQL syntax. Note that the DC is the domain in which the AD resides – in this instance it has three parts MyDomainName,DC=MyTopLevelDomain,DC=MyDomainCountryCode eg DC=microsoft,DC=com,DC=au. In this example, I am returning the username of all users who are a member of the AD group called GAMyADGroupName. This example checks if a particular user is a member of a particular AD group: SELECT *
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'collation compatible', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'data access', @optvalue=N'true'
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'dist', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'pub', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'rpc', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'rpc out', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'sub', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'connect timeout', @optvalue=N'0'
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'collation name', @optvalue=null
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'lazy schema validation', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'query timeout', @optvalue=N'0'
GO
EXEC master.dbo.sp_serveroption @server=N'ADSI', @optname=N'use remote collation', @optvalue=N'true'
GO
FROM OPENQUERY(ADSI,
'SELECT sAMAccountName
FROM ''LDAP://DC=csr,DC=com,DC=au''
WHERE objectCategory = ''Person''
AND objectClass = ''user''
AND memberOf = ''CN=GAMyADGroupName,OU=Other,OU=Global Groups,OU=Security Groups,DC=MyDomainName,DC=com,DC=au''
AND sAMAccountName = ''MyUserName''')
Posted by Kristen Hodges at 1:05 pm 4 comments
Labels: ActiveDirectory, SQL
Monday, 18 February 2008
SSIS Synchronous/Asynchronous Script Transformations in a Data Flow
Asynchronous just means it consumes all rows before progressing to the next step in the data flow – unlike most other transformations where the entire data flow is executed for each row before the next row is commenced. Script transformations can be synchronous (eg Lookup) or asynchronous (eg Sort or Aggregate). A synchronous script transformation is identified because the output is linked to a specified input. This is done in the Inputs and Outputs area of the task by entering the ID from the Input properties in the SynchronousInputID field of the Output properties. For each row of input, there will be one row of output. An asynchronous script transformation is identified firstly because the SynchronousInputID field of the Output properties contains "None" or "0". When a new row is to be added, it is explicitly done using: Output0Buffer.AddRow() In order to aggregate data, flatten rows or concatenate data from different rows, you simply set a flag to identify when a new row should be added. To do this you need to declare a boolean in the class but outside the main sub: Public Class ScriptMain
Inherits UserComponent
Private blnFirstRow As Boolean = True
Dim opMessage1 As String = ""
Public Overrides
Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
In this example, I'm taking all rows and flattenning them into one row so the first step in my sub is:
If blnFirstRow = True Then Output0Buffer.AddRow()
blnFirstRow = False
opMessage1 = opMessage1 & " " & Row.ipMessage.ToString
Note that I then concatenate a variable with a new value… Alternately, you could set a flag so that a new row is added when certain conditions are met such as the value of the ipCustomerNo changes.
One other thing to add - you need to use the PostExecute() sub to write to variables.
Posted by Kristen Hodges at 1:21 pm 6 comments
Tuesday, 8 January 2008
VS Db Pro – Script Deployment - Database Context
When deploying databases using Db Pro the scripts which are executed have a database context. Visual Studio will automatically include a 'USE [YourDbName]' where YourDbName is taken from the project's BUILD property. This is the context for the scripts to execute. This is important because if you change it in any of the scripts it can mess with your deployment. For example, you create a script called 'AgentJobs.sql' in the Post-Deployment folder and add a reference to the script in Script.PostDeployment.sql. This means your AgentJobs script will be executed when you do a deployment. Now assuming you add a 'USE MSDB' statement in the script, your deployment runs successfully HOWEVER any scripts which execute after AgentJobs eg Data.sql (it executes in alphabetical order) will now have MSDB as the context rather than [YourDbName] as expected. So make sure that you use fully qualified references to objects in other databases and DON'T use 'USE'!
Posted by Kristen Hodges at 4:22 pm 0 comments
Labels: SQL, Visual Studio Db Pro (DataDude)