Many articles have been written about how to send an Extensible Markup Language (XML) file and an associated Extensible Stylesheet Language (XSL) file to Web clients to reduce processing load on the server.
Unfortunately, in the real world this isn't going to solve many problems because you are left with a few issues that won't go away:
Not all of your visitors have an XML/XSL-capable browser installed. For those who don't, you'll have to perform the processing on the server.
For those who do have Internet Explorer (IE) 5.0, it is unlikely many will have installed the latest Microsoft XML Parser (MSXML) release. In fact, the version that shipped with IE 5.0 is only 1.0,which doesn't support a number of the funky features of XML, XSL Transformations (XSLT), and XML Path Language (XPath)). Even if you are prepared to write different style sheets for each one of these releases, you will still find it hard to determine what MSXML version is installed.
Although you can filter the output using XSL pattern matching, all the XML data still needs to be downloaded to the Web browser so it can be processed. This gets very slow for the client if they only ever need a few nodes out of the XML document.
More recently, a number of articles have been written about how to process XML/XSL on your server to eliminate client-side compatibility issues. Once again, applying this to the real world has its own set of problems:
XSL syntax is confusing to learn and can make it difficult to perform integrated processing of data (such as rendering data based on both data from the XML document and variables in your ASP/COM layer).
XSL transformations are not fast, and if your XSL style sheets incorporate scripts, it can slow the rendering down significantly.
In most cases the XML document has to be parsed into a XML Document Object Model (DOM) on a per-page basis, which can become very slow, inefficient, and memory/CPU intensive.
But don't despair. There are still some incredible things you can do with XML on your server, and it works today. It doesn't rely on the client having anything other than HTML support. Designers won't have to learn XSL after all, and it's completely free!
If you are in the same situation as my workplace, you may even save the company a large amount of money by the time you're done.
Where You Can Go Today . . . In-Memory Databases Anyone?
When Microsoft first released the specs for Windows 2000, it promised an In-Memory Database (IMDB) component that would be optimized for read-only operations. The possibilities were endless, allowing you to reduce the load on SQL Server and cache the information on your Web servers. Instead of the time/overhead of repeatedly dragging the same information from the database server to the Web client via ActiveX Data Objects (ADO), all the data you need would be right there in memory, only a process away.
Unfortunately somewhere between Beta 1 and Beta 2 Microsoft decided this was a bad idea and dropped it from Windows 2000. The following quote appeared in an MSDN article explaining the decision:
"After evaluating customer feedback from the Microsoft(r) Windows(r) 2000 Beta Program, we've determined that the In-Memory Database (IMDB) does not completely address the needs of most customer scenarios. The IMDB does not support query processing or stored procedures. It requires customers to implement a different database access method." (See http://msdn.microsoft.com/library/techart/whatimdb.htm.)
But as you may have guessed,In-Memory Databases have now become a reality by using XML. The newest release of the XML components, Microsoft XML Parser (MSXML) 3.0, provides both the components and the performance to make this a reality. You are also only limited by your memory capacity because you can get away with storing significant amounts of hierarchical data.
By using the XML's XPath query processor with an XML IMDB, you can create an extremely powerful search engine that can outperform dedicated databases.
In fact, from testing (see part 2 of this article) the performance was such an improvement over SQL Server 7 that you may wonder why you ever considered using a dedicated database server for search operations.
XML vs. SQL Databases
Before considering eliminating the need for SQL -Server, note an IMDB is not a replacement, but a great place to cache commonly accessed data. The MSXML technology doesn't provide persistence, stored procedures, transactions, centralized storage, job schedulers, etc., out of the box, and probably never will.
If your company runs a popular Web site,in most scenarios the site's bottleneck occurs when getting data out of your database and into your ASP scripts so that it can be displayed to the end user. If the database server has to perform a complex search operation in order to fulfill these requests, then the database server will start to show signs of stress well before your Web servers do. No matter how much you optimize the ASP/COM layers, on a busy data-driven site, your application will be waiting while SQL Server runs your query, packages up the results, and delivers them back to your calling application across an overloaded network.
If you are running a separate SQL Server, upgrading this machine or purchasing an additional one becomes expensive very quickly (4/8/16 x CPU hardware, SQL licensing costs, SCSI Redundant Array of Independent Disks [RAID] arrays, etc). On the other hand, scaling horizontally and buying Web servers is relatively cheap (up to a 20th of the cost), and is a more desirable option. In this case you will want to move as much load as possible locally onto your Web servers.
An IMDB is a great way of taking a load off SQL Server!
Ideal Candidates for IMDBs
Determining What Data Should Be Moved to the IMDB
Determining the best use of an IMDB will require some analysis, profiling, and monitoring of your live site.
Windows provides its own performance-monitoring tool that provides real-time figures for analyzing CPUs, Disk I/O, ASP requests, and more..
Another great tool for working out exactly what your database server is doing is the SQL Server Profiler that is bundled for free with Microsoft's SQL Server. This program basically dumps all the integrated Transact SQL (T-SQL)commands that have been sent to SQL Server into a file, along with their associated CPU usage, time taken, and number of read/writes.
Be warned, however, that for a particularly busy site you will only want to turn the Server Profiler on for a short burst of time (i.e., 5 - 10 minutes) because the amount of data created by the profiler will run into the gigabytes very quickly. (This can chew up all your free hard-drive space on production, may be difficult to analyze, and even harder to download to your development environment for analysis). Additionally, this product is somewhat unstable and can crash your database server if used too frequently.
Another key identifier for determining how your site is running - and what people are doing -- is found in the IIS Log Files (see http://msdn.microsoft.com/library/periodic/period01/EventLogging.htm). Tools such as WebTrend Corp.'s Log Analyzer provide detailed reports on site usage, such as how many times a page has been requested.
Finally, load-testing tools that can be run in your development environment can help here (i.e., Microsoft's Web Application Stress Tool (WAS) (see http://msdn.microsoft.com/library/techart/d5wast_2.htm), Mercury Interactive Corp.'s LoadRunner, etc.). Although, unless you create test scripts by reverse engineering your IIS logs, you may be inadvertently biasing the results.
A Real-Life Scenario
Let's see how this could be applied in a real-life scenario (sorry, but names/industries have been changed to protect IP!).
My employer runs a real estate portal on the Internet that hosts a database containing all the houses for sale in Australia and New Zealand. During peak season the site holds about 40,000 advertisements, which can be searched by end users (specifying criteria such as a location, a house type, etc.).
In this scenario (where we have a single 4xCPU SQL Server machine and four Web servers each boasting a dual CPU setup) the situation was something along these lines during peak times:
The CPU utilization on our Web servers would hover around 25 - 30percent as they spent most of their time waiting for requests to be fulfilled.
The CPU utilization on our SQL Server was about 95 percent and response time on queries was significantly longer than when the site was not under heavy load.
The file system on the database server was fast and not under stress so it was ruled out as a possible bottleneck.
Figure 1.1 - Simple data-flow view of Web farm used to host site.
The initial reaction to this problem was that a new database server would be required very quickly in order to sustain the ever-increasing number of visitors (usage was going up by about 10percent per month).
On investigation, it was found that almost 75percent of SQL Server CPU activity was directly attributed to s0earch operations on the table containing 40,000 active advertisements. Even more surprising was that this only actually accounted for 2.5percent of all page requests (which translated to as high as 14,000 search requests per hour).
The usage pattern showed that users tended to perform a search upon first visiting the site, and then spent time actually viewing the results before they went on to more complex functions such as organizing inspections or registering their interest. If the search didn't return enough quality matches or took too long, then the user would generally end their session and go elsewhere.
From this data it became very clear that by moving the search operations on the advertisement data to the Web servers (where a lot of spare CPU cycles where unused), it would not only improve the user experience and encourage further usage, but give the company significant more life from the existing database server. (The database server would only be 25percent utilized afterwards and would cost in excess of $50,000 to upgrade to 8xCPU if required.)
It would be too expensive to purchase a copy of SQL Server for each Web server, therefore another solution was required. That's where the XML IMDB came in.
Technical Implementation
So How Does It Work?
Implementing an IMDB with XML involves storing a preparsed XML document object into memory so you can quickly access its contents without having to reload/reparse it each time.
The parsing of a large XML file is not a quick operation by any means,and once you start getting files larger than a few hundred kilobytes, it becomes very inefficient doing this every time someone requests an ASP script. However, if you could store that already parsed document in memory (such as the application object) for easy access, then all of a sudden it starts becoming a very attractive option.
As many of you will be aware, the MSXML 2.5 onwards provides a "Free Threaded" versio, along with the standard "Rental" version provided in earlier releases, making it possible to persist the parsed document's Document Object Model into an IIS application variable. (See the following 15Seconds article for more details: http://www.15seconds.com/issue/990527.htm)
In addition to MSXML's ability to store data into memory, we'll want to take advantage of its ability to quickly get to specific data and display its contents.
Traditionally, XSL style sheets (the open standard for transforming XML data) have been used to perform this sort of activity, however, when you want to do complex analysis of the data being returned, this gets somewhat tricky.
The alternate way to query your XML data is by using an XPath query along with the .selectNodes method of the MSXML DOM. XPath is a subset of the XSLT standard and provides a few features that make it really powerful, such as hierarchical pattern matching with advanced Boolean logic. XPath is a relatively simple concept to anyone who has written SQL queries "where" conditions and is a surprisingly powerful alternative.
Upon calling the selectNodes method, a collection of matching node objects will be returned. Iterating through this collection with ASP is very similar to iterating through a Visual Basic (VB) Collection object or an ADO recordset, with the added ability to traverse upwards/downwards through the tree structure and incorporate other dynamic data into the output.
XPATH Is to XML as SQL Is to SQL Server
Let's take the following simple XML document representing the classic authors/publisher/books relationship:
<Authors>
<author name="Bob Bobson">
<publisher name="HD Press">
<book name="Optimizing DB Performance" price="29.95" isbn="12345"/>
<book name="Optimizing XML Performance" price="29.95" isbn="12346"/>
<book name="Optimizing ADO Performance" price="19.95" isbn="12347"/>
</publisher>
<publisher name="Training Ltd">
<book name="Teach Yourself XML in 3.5 Days" price="29.95" isbn="42345"/>
<book name=" Teach Yourself ASP in 3.5 Days" price="29.95" isbn="42346"/>
</publisher>
</author>
<author name="Jeff Jefferson">
<publisher name="HD Press">
<book name="UML in the Enterprise" price="29.95" isbn="34333"/>
<book name="Beginners Guide to UML" price="29.95" isbn="52346"/>
</publisher>
</author>
</Authors>
Some example XPath queries for this structure may include:
authors/author
Translates to return list of all nodes of type "Author."
authors/author [@name='Jeff Jefferson']
Translates to return the author nodes where the name attribute="Jeff Jefferson."
authors/author/publisher[book/@price='29.95']
Translates to return all publishers that have books selling for $29.95.
authors/author[(@name='Jeff Jefferson') or (@name='Bob Bobson')]/publisher
Translates to return all publishers that carry books written by Jeff Jefferson or Bob Bobson.
Please note that this is not a full XPath tutorial. Consult the MSXML 3.0 SDK for more detailed references. There is also an on-line tutorial at http://www.zvon.org/xxl/XPathTutorial/General/examples.html that shows how to use the XPath syntax with graphical examples.
Two Important Notes on Using XPath with the selectNodes Method
It is important to note that the XML DOM's selectNodes method that you will be calling on to execute this XPath query has a few quirks when using MSXML 3.0 and also may not be available in other XML implementations.
1) The selectNodes method may not be available if you aren't using MSXML 3.0.
The methods and properties available on the DOM are derived from the W3C DOM Level 1 Specification that defines two groups of programming classes: fundamental and extended. The W3C fundamental classes include those needed to write applications that manipulate XML documents. The W3C defines extended classes as those that might make programming more convenient for developers. (see http://msdn.microsoft.com/library/dotnet/cpguide/cpconxmldomfundamentalclasses.htm). One of the ideas behind this is to allow developers to seamlessly move from platform to platform, or implementation to implementation, without having to retrain themselves on how to manipulate the XML DOM. (For example, if you know how to program in Java, you will be able to switch over to another platform and manipulate your XML DOM using almost the same methods and properties.)
The selectNodes method is an extension to the W3C DOM, which means that you probably won't find this method on many of the Java implementations or other places such as Flash 5. The good news is that this will most likely become part of the W3C XML DOM in future versions.
2) MSXML 3.0 Defaults Itself to Support XSL Pattern Matching Instead of XPath
When using the selectNodes method in MSXML 3.0, the object will be defaulted to use simple XSL pattern matching. As a result, you will need to call the SetProperty method first to toggle the SelectionLanguage property. This property defaults to XSLPattern, however, you will need to set it to XPath. This will enable you to use the XPath functions listed in the documentation (such as contains(), name(), starts-with() etc).
Example 1 (setting to XPath syntax):
objXML.setProperty "SelectionLanguage", "XPath"
Example 2 (setting to XSL pattern matching - default):
objXML.setProperty "SelectionLanguage", "XSLPattern"
This fact is not made clear when reading the MSXML 3.0 SDK docs because all XPath help is located within the XSLT reference (which is the guide to authoring XSL style sheets).
Index Searching with XPath
In the previous set of examples, "Index" style searching was used by XPath to find the relevant set of nodes.I In other words, we performed a search based on exact field matching.
In a database this would be "denormalized" into authors, publishers, books, and price tables, and searching would be done on the books table using AuthorID, PublisherID, and PriceRangeID.
In a live Web site the same sort of denormalization would occur. The relevant IDs would, instead, be sent into the query directly from HTML form variables (from the related select elements).
In this case the query:
authors/author[(@name='Jeff Jefferson') or (@name='Bob Bobson')]/publisher
may become
authors/author[(@authorid='1') or (@authorid='2')]/publisher
The XML file would then end up looking something like this:
Suppose we want to find all books that contain "ADO" in the title. Simply asking for all books where the name is "ADO" won't help us here as "ADO" doesn't equal Optimizing ADO Performance.Therefore, the query won't return any useful results, unless of course we have a book simply called ADO.
One way of performing this action is to use XPath's built-in string functions, "contains" or "starts-with." This is also particularly useful if you need to perform wildcard-style matching (via the name(), contains, and starts-with XPath functions).
To perform a simple query (find all books with "ADO" in the title), the syntax would look something like this:
This uses the XPath contains() function which is very similar to VB's InStr function.
This operation could end up being quite slow if you have to scan a large number of nodes. If wildcard matching isn't supported on your site or, like us, you find that 99 percent of searches don't use wildcards, then there is a much more efficient way of performing this type of searching, by creating a "keyword index" for each record.
A keyword index is a list of all the unique words that appear inside a phrase. In our case we will convert everything to uppercase so when searches occur they are not case-sensitive. In turn, the keywords in the XPath query will also be converted to uppercase.
Let's take the phrase "Optimizing ADO Performance." This can be broken up into three unique keywords -- "optimizing," "ADO," and "performance." To allow us to search for these words, they would be included as child nodes to the Book node.
For example, the XML structure would now look like this:
This allows us to bypass the need to use the XPath functions in the following manner:
authors/author[@id='1']/publisher/book[(k/@name='ADO') or (k/@name='XML')]
Translates to "Find me all the books written by AuthorID=1 that have the word 'XML' or 'ADO' in the title."
authors/author[@id='1']/publisher/book[not (k/@name='XML') not (k/@name='XML')]
Translates to "Find me all the books written by AuthorID=1 that have the word 'XML' but not the word 'ADO' in the title.""
As you can see, by using this technique you can use quite complex Boolean logic.
If you do need to still implement wildcard matching, then you can still use XPath's contains() and starts-with() functions. The added advantage is that you can then correctly differentiate between matches that occur within the word or at the beginning of the word.
For example, if we have a keyword index - and we want to search for VB* as opposed to *VB*, then we would use the starts-with function instead of contains. If we don't break it up into a keyword index, then we can never be sure that we are finding our match at the start of a word.(Even if we look for a space beforehand, this will not work with formatting characters or words at the start of a sentence.)
Putting It All Together
ASP Example
So now that we've examined the technical side, let's look at how we might get this implemented using ASP/VBScript.
Firstly, let's consider what code needs to be placed in the global.asa file. This code is purely there to do two things:
Load and parse the XML file into an XML Document Object Model (DOM)
Persist the parsed DOM into IIS so it can be accessed within your ASP pages
<Global.asa>
Sub Application_OnStart()
<Other Code goes in here>
dim objXML
'Firstly Initialize the Free Threaded MSXML 3 DOM
set objXML = Server.CreateObject("Microsoft.FreeThreadedXMLDOM")
'Switch off Asynchronous Operation
objXML.async = false
'Load the File <XMLFILE>
if objXML.load(server.mappath(<XMLFILE>)) then
'If Loaded/Parsed Ok then Store the Object in the IISApplication Object
set application("XDOM") = objXML
application("XDOMLoaded") = true
application("XDOMError") = ""
else
'If Error than store the error message inside IISApplication
application("XDOMLoaded") = false
application("XDOMError") = "Loaded With Errors! - " & " Line : " & _
objxml.parseError.Line & " Char : " & _ objxml.parseError.linepos & " Message : " & _ objxml.parseError.reason & objxml.parseError.srcText
end if
set objXML = nothing
<Other Code goes in here>
End Sub
<searchform.asp>
This file would be a page that displays the search form to the end user. Once this form is submitted, the performsearch.asp page below will conduct the search. (I will not provide an example of this script here because I'm sure you know how to create HTML forms.)
<performsearch.asp>
This file actually performs the search using the selectNodes method. Results from this search are returned to the IXMLDOMNodeList object, which exposes a length property
Submitted. The performsearch.asp page below will conduct the search.
Dim objXML, objXMLNodeList, sXPathQry, objNode, lngLength
'Get the DOM from the Application
Set objXML = Application("XDOM")
'Call a function that constructs the appropriate XPath query from Form Vars
sXPathQry = ConstructXPath()
'Perform the Search - and return resulting collection to XMLNodeList Object
set objXMLNodeList = objXML.SelectNodes(sXPATHQry)
'Now Check for Results
if not objXMLNodeList is nothing then
lngLength = objXMLNodeList.length
if lngLength > 0 then
Response.write "Your Search returned " & lngLength & " Matches : "
For each objNode in objXMLNodeList
<Output Matches from objNode - eg. objNode.attributes.getnameditem("name").text>
next
else
Response.write "Sorry - No Results were found!"
end if
else
<PRINT ERROR MESSAGES - FATAL ERROR OCCURED>
end if
What's Next
In Part 2 of this article we will start to get to the juicy details, such as how to make this rock for huge amounts of data, benchmarking your solution, and scalability issues. We'll explore just how important the structure of your XML file will be (do we use tags, attributes, branches, etc?) - a list of golden rules for you to follow, and how we managed to make this solution perform over 20 times faster than the best SQL Server solution we used on our live site.
About the Author
Niall Ginsbourg is a chief Web architect at Seek Communications in Melbourne, Australia. He has spent the last years as a senior A/P, team leader, and consultant in various industries, specializing in Microsoft VB, SQL, ASP/Site-Server-based e-commerce and client/server solutions. Niall is self taught and also has knowledge in areas such as XML, WAP, Flash/Generator, Delphi, and extensive multimedia/CD-ROM production experience. He can be reached at nginsbourg@today.com.au.
Stonebroom.ASP2XML(c) is an interface component designed to make building
applications that transport data in XML format much easier. It can be used
to automatically pass updates back to the original data source.
Right now the latest buzzword around town is AJAX. AJAX is an acronym for Asynchronous JavaScript and XML and is a method used to implement remote calling. The problem is that AJAX is only implemented in ASP.NET 2.0. This article will show you one way to implement remote calling without using AJAX or the XMLHttpRequest object. The technique outlined can even be used from classic ASP and is sufficient for most remote calling needs. [Read This Article][Top]
This article is the third and final installment of Alex Homer's series covering the new XML support in Microsoft SQL Server 2005. In it he covers updating the contents of xml columns, comparing traditional XML update techniques with XQuery, and using XQuery in a managed code stored procedure. [Read This Article][Top]
In the second part of his series on SQL Server 2005's new XML support, Alex Homer looks at extracting data from XML columns, comparing traditional XML data access approaches with XQuery, and combining XQuery and XSL-T.
[Read This Article][Top]
Microsoft SQL Server 2005 now offers great support for and close integration with XML as a data persistence format. In the first article of his series examining this new support, Alex Homer offers an overview of how SQL Server 2005 stores XML documents and schemas, examines how it supports querying and manipulating XML documents, and provides a simple test application that allows you to experiment with XQuery. [Read This Article][Top]
In the final article of his series on reading and writing XML in .NET 2.0, Alex Homer looks at how the updated XML document store objects XmlDocument, XmlDataDocument and PathDocument can be used to read, persist and write XML documents and fragments more easily and more efficiently than in .NET 1.x. [Read This Article][Top]
In the final article of his series on reading and writing XML in .NET 2.0, Alex Homer looks at how the updated XML document store objects XmlDocument, XmlDataDocument and PathDocument can be used to read, persist and write XML documents and fragments more easily and more efficiently than in .NET 1.x. [Read This Article][Top]
Alex Homer continues his series on reading and writing XML in .NET 2.0. In part one, we focused on the reading side of things, examining the XmlReader and XmlReaderSettings classes. In this article, we move on to look at the XmlWriter and XmlWriterSettings classes, and how they can be used to write XML documents and fragments more easily and more efficiently than in version 1.x of .NET.
[Read This Article][Top]
Alex Homer continues his series on reading and writing XML in .NET 2.0. In part one, we focused on the reading side of things, examining the XmlReader and XmlReaderSettings classes. In this article, we move on to look at the XmlWriter and XmlWriterSettings classes, and how they can be used to write XML documents and fragments more easily and more efficiently than in version 1.x of .NET. [Read This Article][Top]
In the first part of his series on reading and writing XML in .NET 2.0, Alex Homer discusses the XmlReader and XmlReaderSettings classes. The XmlReader exposes several useful new features and the all new XmlReaderSettings class makes it easy to generate single or multiple instances of an XmlReader with a range of useful properties. [Read This Article][Top]
In the first part of his series on reading and writing XML in .NET 2.0, Alex Homer discusses the XmlReader and XmlReaderSettings classes. The XmlReader exposes several useful new features and the all new XmlReaderSettings class makes it easy to generate single or multiple instances of an XmlReader with a range of useful properties. [Read This Article][Top]
Mailing List
Want to receive email when the next article is published? Just Click Here to sign up.