"Improving XML Data Access Performance with SQL Server 2005"
This is the second in a series of three articles that look
at how the latest version of Microsoft’s enterprise-level database, SQL
Server 2005, now offers great support and close integration with XML as a
data persistence format. This includes new ways to validate, store and query
XML documents that are stored within the database. SQL Server 2005 provides
native support for XML that can vastly improve application performance, while
supporting robust and safe multi-user access to the data contained within the
XML documents.
The topics we covered in the previous article were:
- A brief overview of the way that SQL Server 2005 stores
XML documents and schemas
- How SQL Server 2005 provides support for querying and
manipulating XML documents
- A simple test application that allows you to experiment
with XQuery
In this article, we'll look at:
- Extracting data from xml columns
- Comparing traditional XML data access approaches with
XQuery
- Combining XQuery and XSL-T
Then, in the final article of the series, we'll be
exploring:
- Updating the contents of xml columns
- Comparing traditional XML update techniques with XQuery
- Using XQuery in a managed code stored procedure
Storing XML in a Database - the Traditional Approach
Just because XML was originally conceived
as a stream-based or a disk-based data persistence format doesn't mean that
developers ignored their usual data stores. After all, an XML document is
basically just a string of text characters, and so can be stored in a database
using a char, varchar, or text type column.
Of course, this means that, when you want
to use the XML, you have to read the complete document from the column. And if
you want to edit and update it, you have to write the complete document back
into the table again. If it's a large document, this can soak up network
bandwidth, processing and server resources, and reduce application performance.
Another approach is to "shred"
the document into its component parts and store each value from the document in
a separate column. This has several advantages. One is that you can store each
value in a column of the appropriate data type (int, varchar, datetime, etc.). This allows simple indexing of the document (something you
can't easily do if you store the whole thing in a single column), and means
that comparisons can be made based on the correct data type.
In other words, you can search for things
like product IDs or names without having to extract the whole document and
parse it each time. And you can select rows that meet specific conditions, such
as dates in a particular range, by simply using a SQL WHERE clause on the appropriate column.
The problem with the "shredding"
approach is the amount of developer effort required to build the data access
code that performs all these tasks, including validating the document,
extracting each value, and interfacing with the database to persist and extract
values and complete documents as required. As they say in TV commercials,
"there has to be a better way!" The better way is, of course, to use
a database such as SQL Server 2005 that provides all these features built into
the core database engine.
Storing XML in a Database - SQL Server 2005
In the previous article, we saw how SQL
Server 2005 provides several features that make persisting and managing XML
document much easier and more efficient than the traditional approaches we've
just been discussing. The core features are:
- A dedicated column type named xml that can be used to store XML
documents or fragments of XML
- The ability to register XML schemas with SQL Server 2005,
and store schema information within the database
- Automatic validation of XML documents when a schema is
present; and automatic shredding of the XML data to support efficient
querying and updating of the content
- An implementation of a subset of the W3C XQuery language
to provide this querying and update facility
- Support for hosting the .NET Common Language Runtime (CLR)
within SQL Server, which allows stored procedures that manipulate XML
documents to be written in managed code
We described these features in the previous
article, and provided a simple application that you can use to experiment with
XQuery in SQL Server 2005. In this and the subsequent article, we'll use the
techniques outlined above to demonstrate how you can save time and effort, and
get better application performance, when using XML in SQL Server 2005.
About the Example Applications
As well as the XQuery Tester application
described in the previous article, you can also use several other pages
provided in the downloadable samples to see XQuery in action. The full list of
these examples is:
- Reading XML Values the Traditional Way
- Using a Simple XQuery with an 'xml'
Column
- Comparing the Traditional Way and XQuery
- Using Parameters with an XQuery
- Combining XQuery and XSL-T
- Modifying an 'xml' Column with XQuery
- Comparing Traditional and XQuery Updates
- Using a Managed Code Stored Procedure in
SQL Server 2005
We won't be listing all of the code in this article, because
much of it uses standard ASP.NET and ADO.NET techniques to create the pages and
connect to the database. What we will concentrate on is how the examples
exploit the new XML data management features in SQL Server 2005. Many of the
examples use an un-typed xml column so that the queries are easier to assimilate (no namespace declaration
or prefixes are required). However, some processes (such as the modify method when
replacing values in the XML) require a typed column, and so the script we
provide creates two tables - one named Store with an un-typed xml column and one
named StoreTyped with a typed xml column. You'll see both used in the examples.
Traditional XML Handling Techniques versus XQuery
Traditionally, developers who need to access values in an
XML document stored in a database would read the complete document into an XML
parser, and then navigate through the document to extract the values they need.
The first example, "Reading XML Values the Traditional Way"
demonstrates this approach. It uses a SQL statement that extracts the contents
of the xml column named Demographics,
for a specified CustomerID value:
Dim sql_getbank As String = "SELECT
Demographics FROM Store " _
& "WHERE
CustomerID = @CustomerID"
The page also uses an ASP.NET GridView control and a SqlDataSource control to display a list of customers, and selecting one executes a procedure
named ShowBankName.
This procedure connects to the database, creates a command and adds a parameter
for the CustomerID to it, then calls the ExecuteScaler method of the command to
get the XML from the matching row in the database:
Sub ShowBankName(ByVal sender As Object, ByVal
e As EventArgs)
Dim sXML As String = String.Empty
Using con As New SqlConnection(ConfigurationManager.ConnectionStrings
_
("XMLTestConnectionString").ConnectionString)
Dim cmd As New SqlCommand(sql_getbank,
con)
cmd.Parameters.Add(New SqlParameter("@CustomerID",
_
Int32.Parse(grid1.SelectedDataKey.Value)))
Try
con.Open()
sXML = cmd.ExecuteScalar()
con.Close()
Catch ex As Exception
lblMessage.Text = ex.Message
End Try
End Using
...
Now the code can check if an XML document is available, and
if so load it into an XmlDocument instance and use the SelectSingleNode method to get a reference to the BankName element. The value of this element is in the
first child node (a text node), so it can be extracted from the Value property of ChildNodes(0) and
displayed in a Label control on the page:
...
If sXML <> String.Empty Then
Dim xd As New XmlDocument()
xd.LoadXml(sXML)
Dim xn As XmlNode = xd.SelectSingleNode("//BankName")
lblMessage.Text = "Bank name is:
<b>" & xn.ChildNodes(0).Value & "</b>"
Else
lblMessage.Text = "Cannot find
bank name"
End If
End Sub
Extracting a Value Using XQuery
The second example, "Using a Simple XQuery with an
'xml' Column", achieves the same result as the preceding example, but
by using an XQuery instead. The SQL query used this time calls the query method of
the Demographics column to access the BankName node, and uses the XPath data function to extract just the value (the content of the BankName element):
Dim sql_getbank As String = "SELECT Demographics.query('data(//BankName)')
" _
& "FROM
Store WHERE CustomerID = @CustomerID"
Executing this query simply returns the bank name as a String, and so the ShowBankName routine in this example - after opening the connection to the database and
executing the query - simply displays the value that is returned:
lblMessage.Text = "Bank name is:
<b>" & cmd.ExecuteScalar() & "</b>"
As you can see, this makes the code much simpler and more compact,
and has the added benefit of reducing network traffic between the database and
the data access code. Figure 1 shows the two example pages we've just
described.

Figure 1 - Reading the bank name
using both traditional techniques and XQuery
Comparing XQuery Performance with the Traditional Approach
The third example we provide, "Comparing the
Traditional Way and XQuery", does much the same as the two examples
you've just seen, but this time extracts and displays the customer name and
bank name for all 700+ rows in the Store and StoreTyped tables. It allows you to
choose one of three ways to accomplish this - using the traditional approach,
using XQuery against an un-typed xml column, and using XQuery against a typed xml column. The
three queries it uses are:
Dim sql_trad_getbank As String =
"SELECT CustomerName, Demographics FROM Store"
Dim sql_xquery_getbank As String =
"SELECT CustomerName, " _
& "BankName = Demographics.query('data(//BankName)')
FROM Store"
Dim sql_xquery_getbank_typed As String =
"SELECT CustomerName, " _
& "BankName = Demographics.query('declare
namespace x=""http://testschemas/StoreSurvey""; " _
& "data(//x:BankName)') FROM StoreTyped"
The code to extract the rows records the current time,
executes a command using the appropriate query, and then iterates through the
rows adding the customer name and bank name to a StringBuilder. Once all the rows have
been processed, the number of elapsed milliseconds is calculated, and this -
together with the list of customer and bank names - is displayed in the page.
This is the relevant section of code for the traditional (non-XQuery) approach:
Dim start As DateTime = DateTime.Now
con.Open()
reader = cmd.ExecuteReader(CommandBehavior.CloseConnection)
While reader.Read()
Dim xd As New XmlDocument()
xd.LoadXml(reader("Demographics"))
Dim xn As XmlNode = xd.SelectSingleNode("//BankName")
If Not xn Is Nothing Then
builder.Append("Customer '"
& reader("CustomerName") _
& "' banks at
" & xn.ChildNodes(0).Value & "<br />")
End If
End While
Dim span As TimeSpan = DateTime.Now.Subtract(start)
builder.Insert(0, String.Format("<p>Data
access took {0} milliseconds</p>", _
span.Milliseconds.ToString()))
lblMessage.Text = builder.ToString()
The second and third options, executing an
XQuery, use similar but much more compact code:
Dim start As DateTime = DateTime.Now
con.Open()
reader = cmd.ExecuteReader(CommandBehavior.CloseConnection)
While reader.Read()
builder.Append("Customer '"
& reader("CustomerName") _
& "' banks at
" & reader("BankName") & "<br />")
End While
Dim span As TimeSpan = DateTime.Now.Subtract(start)
builder.Insert(0, String.Format("<p>Data
access took {0} milliseconds</p>", _
span.Milliseconds.ToString()))
lblMessage.Text = builder.ToString()
Figure 2 shows the results of selecting the traditional
approach, and you can see that - with the hardware we tested against - this
took some 400 milliseconds to extract the data and process all the rows. This
time includes that taken to manipulate the StringBuilder:

Figure 2 - Reading multiple rows
using traditional XML data access techniques
However, as you can see in Figure 3, data access times are
considerably improved when using XQuery. For an un-typed xml column, the
time taken to extract and process the same number of rows is some 25% less -
the average on our system was around 300 milliseconds. And even more striking
is the performance when accessing a typed xml column. In this case the average access
and processing time was around 220 milliseconds - a reduction against the
traditional approach of some 45%. This is because SQL Server 2005 can shred the
XML and store it in a more efficient manner, providing faster access when
querying and when updating the content of XML documents.
And remember that this is with the database and Web server
on the same machine, and includes the time for iterating the rows and building
the display string in the StringBuilder.
The reduced network bandwidth usage when accessing a remote SQL Server will
enhance the differences even more, but the percentage reduction in the actual
data access times is impressive even when these factors aren't taken into
account.

Figure 3 - Reading the bank name
using both traditional techniques and XQuery
Using Parameters with XQuery
The next example, "Using Parameters with an XQuery",
demonstrates some of the ways that you can pass parameter values to an XQuery,
depending on how you want to specify the results that are included in the
rowset returned by the query. As we discussed in the previous article, you can
access the value of a T-SQL variable from within an XQuery or XPath statement
using the sql:variable function. The example provides four different queries that you can execute, including
a stored procedure that executes an XQuery, and displays the results in an
ASP.NET GridView control. Figure 4 shows the example page in action with the results of
executing the stored procedure visible.

Figure 4 - Different ways of
using parameters with an XQuery statement
The first option selects rows using traditional T-SQL
techniques, where the parameter is used in the WHERE clause. The value method of the Demographics column is used to extract the value of the NumberEmployees element, and cast it to
an int data
type so it can be compared directly to the value of the @emp parameter (of type SqlDbType.Int)
that is passed to the query by a SqlParameter added to the command. The SELECT clause
contains an XQuery that extracts just the value of the BankName element from the rows that match
the expression in the WHERE clause. This produces the same result as you see in Figure 4:
SELECT CustomerName,
BankName = Demographics.query('data(//BankName)')
FROM Store
WHERE Demographics.value('(//NumberEmployees)[1]',
'int') > @emp
The second statement attempts to replace
the WHERE clause with a condition within the XQuery itself. The call to the query method of the Demographics column includes an XPath that selects the
value of the BankName element, but only where the StoreSurvey root element has a NumberEmployees element containing a value that matches the parameter passed to the statement.
Notice how the XPath number function is used to convert the data type of the NumberEmployees element with the data
type of the parameter:
SELECT CustomerName,
BankName = Demographics.query('
data(/StoreSurvey[number(NumberEmployees[1])
> sql:variable("@emp")]/BankName)
')
FROM Store
However, this doesn't actually produce the result we want.
If you run the example and select this option, you'll see that the rowset
returned by the query contains all the rows from the table. The bank
name is NULL in rows where the number of employees is less than the value in the parameter
passed to the query. This is a common mistake, and demonstrates how you have to
be careful when using XQuery to select rows.
The third statement solves this problem by combining the
XQuery selection technique with the use of a WHERE clause. In this case, the exist method of
the Demographics column is used to return either True or False, depending on whether there is a node in the XML
document for this row that contains a NumberEmployees element with a value
greater than the value of the @emp parameter (the syntax .[1] means "the first node of the
current nodeset" - remember that the number function requires a single node,
whereas the //NumberEmployees XPath statement returns a node collection even if there is only one node that
matches). And, again, the number function is used to ensure that the data types of the element value and the
parameter value match:
SELECT CustomerName,
BankName = Demographics.query('data(//BankName)')
FROM Store
WHERE Demographics.exist('//NumberEmployees[number(.[1])
> sql:variable("@emp")]') = 1
The final option is a statement that just executes a stored
procedure named GetBankNamesByEmployee within the database. The TSQL used in this stored procedure is the same as that
described above (for the third option). However, it does demonstrate that you
can use XQuery in just the same way in a stored procedure as you do in
declarative and parameterized SQL statements:
CREATE PROCEDURE dbo.GetBankNamesByEmployee
@emp int AS
SELECT CustomerName, BankName = Demographics.query('data(//BankName)')
FROM Store WHERE Demographics.exist('//NumberEmployees[.
> sql:variable("@emp")]') = 1
Combining XQuery and XSL-T
Having seen how you can extract values from
an XML document stored in SQL Server 2005, this next example changes direction
somewhat by extracting an XML document of a specific format from the contents
of an xml column, and then uses this XML to generate an HTML page by applying
an XSL-T style sheet to the newly-generated XML. The SQL statement with its
embedded XQuery looks like this:
SELECT Demographics.query('
<sales-summary>
<customer-name>{ sql:column("CustomerName")
}</customer-name>
<sales-data>
<total-sales>{ data(//AnnualSales)
}</total-sales>
<employees>{ data(//NumberEmployees)
}</employees>
<sales-per-employee>
{ round(number((//AnnualSales)[1])
div number((//NumberEmployees)[1])) }
</sales-per-employee>
<store-size>{ data(//SquareFeet)
}</store-size>
<sales-per-sqfoot>
{ round(number((//AnnualSales)[1])
div number((//SquareFeet)[1])) }
</sales-per-sqfoot>
</sales-data>
</sales-summary>
')
FROM Store WHERE CustomerID = @custid"
You can see that it creates an XML document
with the root element sales-summary, and with two child
elements. The first is a customer-name element, and this is
followed by a sales-data element that itself contains five other items of sales-related
information. Notice how the sales-per-employee and sales-per-sqfoot values can be calculated within the XQuery using the XPath number function
and div operator. The result is then converted to a non-fractional number using the round function.
The resulting XML document will look something like this:
<sales-summary>
<customer-name>Acceptable Sales & Service</customer-name>
<sales-data>
<total-sales>800000</total-sales>
<employees>12</employees>
<sales-per-employee>66667</sales-per-employee>
<store-size>7000</store-size>
<sales-per-sqfoot>114</sales-per-sqfoot>
</sales-data>
</sales-summary>
The XSL-T Style Sheet
We then use a simple XSL-T style sheet to
transform this XML into HTML. This is the style sheet:
<?xml version="1.0"
encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<b>
Annual sales figures for customer
'<xsl:apply-templates
select="./sales-summary/customer-name" />'</b>
<p />
<xsl:apply-templates
select="./sales-summary/sales-data" />
</body>
</html>
</xsl:template>
<xsl:template match="customer-name">
<xsl:value-of
select="."/>
</xsl:template>
<xsl:template
match="total-sales">
Your total annual sales are $<xsl:value-of
select="."/><br />
</xsl:template>
<xsl:template
match="employees">
The number of people you employee is
<xsl:value-of select="."/><br />
</xsl:template>
<xsl:template
match="sales-per-employee">
Your annual sales per employee are
therefore $<xsl:value-of select="."/><br />
</xsl:template>
<xsl:template
match="store-size">
Your store has a total square footage
of <xsl:value-of select="."/><br />
</xsl:template>
<xsl:template match="sales-per-sqfoot">
This means that the annual sales per
square foot are $<xsl:value-of select="."/><br />
</xsl:template>
</xsl:stylesheet>
You can see that it creates the html and body elements, and - within the body element - applies a
template that inserts the customer name into the output. Then it applies
templates that match the five other elements within the nested sales-data element to insert these values and the appropriate accompanying
text into the output.
Executing the XQuery and Applying the Style Sheet
The relevant section of code in the ASP.NET
page "Combining XQuery and XSL-T" that uses this XQuery and
style sheet are shown next. The page uses an ASP.NET GridView control and a SqlDataSource control to display a list
of customer names, as in the first two examples in this article. Selecting a
row in the GridView executes a routine named ShowSalesReport. This routine
opens a connection to the database, creates a command, adds a parameter to
specify the selected customer, and executes the SQL statement you saw earlier.
The XML document returned from the query is
accessed by calling the GetSqlXml method of the SqlDataReaderto get a SqlXml instance, which exposes the CreateReader method that
we can use to get an XmlReader over the XML document itself. We do this, rather than extracting
the value as a String, so that we can pass the reader to the Transform method of the XslCompiledTransform class later in our code:
Sub ShowSalesReport(ByVal sender As Object,
ByVal e As EventArgs)
Dim builder As New StringBuilder()
Dim xmlr As XmlReader
Using con As New SqlConnection(ConfigurationManager.ConnectionStrings
_
("XMLTestConnectionString").ConnectionString)
Dim cmd As New SqlCommand(sql_select,
con)
cmd.Parameters.Add(New SqlParameter("@custid",
SqlDbType.Int))
cmd.Parameters("@custid").Value
= Int32.Parse(grid1.SelectedDataKey.Value)
Try
con.Open()
Dim dr As SqlDataReader = cmd.ExecuteReader(CommandBehavior.CloseConnection)
dr.Read()
Dim oXML As SqlXml = dr.GetSqlXml(0)
xmlr = CType(oXML.CreateReader(), XmlReader)
Catch ex As Exception
lblMessage.Text = ex.Message
Return
End Try
...
The next step is to create an instance of
the XslCompiledTransform class, and load the XSL-T style sheet - this is stored in the data subfolder of
the samples. Then we create a StreamWriter that will create the output file, and call
the Transform method to generate the new HTML document. The second parameter to the Transform method
is a reference to an XsltArgumentList instance, but as we have to arguments to pass to the style sheet we use Nothing for this
parameter.
...
Dim sPath As String = Path.Combine(Request.PhysicalApplicationPath,
"data")
Dim transform As New XslCompiledTransform()
Try
transform.Load(Path.Combine(sPath,
"sales-transform.xslt"))
Using writer As New StreamWriter(Path.Combine(sPath,
"sales-report.htm"))
transform.Transform(xmlr, Nothing,
writer)
End Using
lblMessage.Text = "Created sales
report <a href='data/sales-report.htm' target='_blank'>" _
&
"sales-report.htm</a>"
Catch ex As XsltException
lblMessage.Text = ex.Message
End Try
End Using
End Sub
Figure 5 shows this example page in action.
You can see the hyperlink that is created after the new HTML file has been
generated, and clicking on this hyperlink displays the HTML page in a new
browser window.

Figure 5 - Creating an HTML
document from an XML document generated through an XQuery
Summary
In the previous article, we discussed the great new features
in SQL Server 2005 that make it easier to work with XML documents, and also allows
you to write more efficient data access code when you need to persist, access
and update XML. SQL Server 2005 is not the first database to provide these
features, but the combination of these new features and integration with
ADO.NET make it easy to take advantage of them, while vastly reducing the
amount of code you have to write.
In this article, you saw this demonstrated in several
examples. We looked at some simple techniques for extracting data from an xml column using
XQuery, and then compared the performance of XQuery with the traditional
approach of loading the entire document into an XML parser and extracting the
required values. As you saw, XQuery reduces the bandwidth requirements by
limiting the data returned over the network, removes the need to use an XML
parser client-side, and dramatically reduces processing overhead and response
times. And, when the XML document is stored in a typed xml column, the performance improvements
are even more noticeable.
Then we looked at some different ways you can pass values to
a query that contains an XQuery statement - pointing out one of the common
pitfalls when selecting rows to be returned. Finally, we saw how you can use
XQuery to create custom XML documents from the data stored in an xml column, and
then apply a style sheet to the result to transform it into any other format
you might require. Of course, XQuery isn't limited to generating simple scalar
values and XML documents or fragments. You can use XQuery to generate any
output you want, for example you could generate comma-separated or
tab-separated data directly within your XQuery if this is the format that your
applications require.
In the next and final article in this series, we'll delve
deeper into XQuery and SQL Server 2005 XML features to see how we can improve
performance when updating XML documents stored in the database, and how we can
use XQuery within managed code stored procedures.