One of the things I have been working with lately is the XML data type in SQL Server 2005. I am a fan of XML, and I love working with SQL Server. When the two “officially” came together in SQL Server 2005 I could not have been happier.
The project I am working on didn’t put the xml data type to use immediately. I am a firm believer in the concept of not using something just because it is cool (although I do love to play with all the new cool technologies coming out). However, it recently became readily apparent that the mechanism we had in place would not fill the requirements of what we were trying to accomplish, and the xml data type seemed to be the answer.
The problem we were running into is the size of a string parameter we were passing to a number of stored procedures, which in turn are the data source for some reports. This string parameter contained a comma separated list of employees that were selected from a list via the front end application. If no employees were selected then we just didn’t include that in the query filter and we were good to go. But what if they did select some employees to filter on? How big do you make that varchar parameter? Meaning, the parameter in the stored procedure looked like this:
CREATE PROCEDURE tempproc
(
@startdate DateTime,
@enddate DateTime,
@emplist Varchar(1000)
)
As
…
And this even brought into question the issue of filtering on this list. It is no secret that the following is not possible in T-SQL:
WHERE Employee.EmployeeID IN (@emplist)
The solution to this problem was to create a user defined function which took that parameter of comma separated strings and returned a table of ID’s. This function looks like the following:
CREATE Function udf_SplitIDString
(
@IDString varchar(1000)
)
Returns @IDs table
(ID int)
AS
BEGIN
Declare @Position int
While len(@IDString) > 0
BEGIN
SET @Position = CHARINDEX(‘, ‘ @IDString)
If @Position > 0
BEGIN
INSERT @IDs
SELECT CONVERT(int, LEFT(@IDString, @Position – 1))
SET @IDString = RIGHT(IDString, len(IDString – Position)
END
ELSE
BEGIN
INSERT @IDs
SELECT CONVERT(int, @IDString)
END
END
END
RETURN
END
By using this method, it meant that the following join would need to take place in each stored procedures that needed this functionality.
INNER JOIN dbo.udf_SplitIDString(@emplist) dbo.employee.employeeid = udf_SplitIDString.ID
This solution worked, and worked well. We did see a very slight decrement in performance but not anything to raise an alarm. The decrease in performance was in the parsing of the id string in the function.
However, we need to go back to the original question. Is a Varchar(1000) big enough? A size of 1000 would only get you just over a couple hundred employees (assuming you started at employeeid 1). If you incremented that number to 2000, you would only get just over 450 employees (again, assuming you started at employeeid 1). If you started with an employeeid any higher, the number of returned employees would decrement significantly.
Many of our customers are quite large and could easily fill up this list when selecting a range of employees (for example, selecting a range by department or location, etc).
In comes the xml data type. If you haven’t read anything about the xml data type in SQL Server 2005, you need to stop everything you are doing and spend a good hour or so reading up on it. For a “first iteration”, Microsoft did an extremely fine job.
SQL Server 2005 comes with an xml data type. This data type can be used in creating a column, as shown below in figure 1, or as variables.

Figure 1
Since we needed to pass xml, we chose the latter, meaning, to use it as a parameter variable in our stored procedures. Using the xml data type as a variable would remove the constraint of having to limit the size of the string being passed.
Using the xml data type as a variable would also provide one other benefit. It meant that the join to the function did not have to be made and the parsing of the comma separated string did not have to been done.
Now, our parameter list looked like the following:
CREATE PROCEDURE tempproc
(
@startdate DateTime,
@enddate DateTime,
@emplist xml
)
As
…
An example of the xml that would be passed to the stored procedure looks like the following:
<ids>
<v id="2780"/>
<v id="1678"/>
<v id="2831"/>
<v id="3977"/>
<v id="729"/>
<v id="1142"/>
</ids>
The join then became the following:
INNER JOIN @emplist.nodes('ids/v') as EmpJoinValues(EmpJoin) on EmpJoin.value('@id','int') = hr180.HR180ID
This method worked wonderful. It was fast and efficient. It was time to take it to the next level.
Many of the reports we run filter on a number of criteria, such as location, department, category, and others. Some of our larger reports filter on up to 6 or 7 criteria. For example, our parameter list could look like this:
(
@startdate DateTime,
@enddate DateTime,
@emplist Varchar(1000),
@loclist Varchar(1000),
@deptlist Varchar(1000),
@catlist Varchar(1000),
@classlist Varchar(1000),
@paylist Varchar(1000),
)
As
…
Just like the first example, depending on the number of items they pick from each list, we could easily reach the 1000 character length mark. But the more important issue is now we are doing a 6 or more joins using the SplitIDString function, and now this function has to parse a whole lot of strings. Not too efficient.
So our thought was that if we could do xml with one of those parameters, why couldn’t we do it with the rest? So, we converted the other parameters to xml data type. Our parameter list now looked like the following:
(
@startdate DateTime,
@enddate DateTime,
@emplist xml,
@loclist xml,
@deptlist xml,
@catlist xml,
@classlist xml,
@paylist xml,
)
As
…
We also adjusted the WHERE clause appropriately to join on the xml just as we did in our first test. For example:
INNER JOIN @emplist.nodes('ids/v') as EmpJoinValues(EmpJoin) on EmpJoin.value('@id','int') = hrtable.employeeid
INNER JOIN @loclist.nodes('ids/v') as EmpJoinValues(EmpJoin) on EmpJoin.value('@id','int') = location.locationid
And so on.
When we ran the report, we got an error stating that the query was too complex. Huh? We started trouble shooting and decided to add the parameters back as varchar(1000) and add the xml data type one at a time for each parameter.
The report ran extremely well until our xml parameter list hit 5 or 6 and more, at which point the following error was returned:
The query processor ran out of internal resources and could not produce a query plan.
This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions.
Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
In working with Microsoft on this, it turns out that this is a bug in the query compilation. Their interim/workaround for now is to change the WHERE clause to the following:
AND hrtable.employeeid IN (SELECT [joinvalues].[ref].value(‘.’, ‘int’) FROM @emplist.nodes(‘/ids/values/@id’) joinvalues([ref])),
And so on for each xml join.
We are currently running tests to determine performance and thresholds. Stay tuned for the results.
Summary
As I stated earlier, I love XML and I love SQL Server, and even though we have had to provide some workarounds for our solutions, it still in many ways makes me take a step back and realize how cool this technology lives. This article is not intended to deter you from using the xml data type. Quite the opposite. In fact, I’d recommend that you look to see where your database and application might benefit from such as great technology.