|
Introduction
Having mentioned the ability to execute stored Access database queries from ASP via ADO in a thread at SitePoint Forums, Dave Maxwell asked what the speed difference was (if any) between these stored queries and traditional hardcoded dynamic queries. I had to confess that I didn't know, but I said that I would find out. This article is the result of the research that followed ...
Performing Queries on an Access DB
Access databases are usually queried from ASP by executing an SQL statement using the RecordSet object. Sometimes developers use the Command object instead, but it's rarely for any particular reason, rather it's just the way they've always done it. These SQL statements tend to be hardcoded into the ASP script, sometimes with variables derived from a querystring or a posted form inserted into the SQL, usually as part of a WHERE clause, and so queries generated and executed in this way are usually referred to as "Dynamic Queries".
When developing an Access database using the desktop Microsoft Access application, you are able to create and store queries. These are usually referred to as "Stored Queries", similar to SQL Server's "Stored Procedures", but less powerful. They are only SQL statements, no more, no less; they cannot do what Stored Procedures do on SQL Server!
Stored Queries can be called from ASP, which means that you don't have to hardcode queries into your ASP code. This is good because it means that if you need to alter a query, you don't have to edit an ASP file, just the Stored Query in the database. This is particularly helpful if you execute the same query on more than one page. Why hardcode it many times when you can code it once, and refer to the same query in each script. This improvement in administration is usually the reason that articles recommend using Stored Queries.
However, it appears that there may also be another, more significant gain from using Stored Queries - performance. But this is never backed up with figures; it's usually just an unsupported claim! Stored Queries suffer from a severe lack of exposure on the Web (I bet that many of you will never have used them in ASP before because you hadn't heard of them, or didn't know you could), and so it is no surprise that I have never come across an article examining the performance aspects of their use.
Hence, I decided to perform my own investigation into Stored Queries performance gains. However, it's not quite that simple. ADO provides three types of objects which can be used to execute a query (RecordSet, Command, and Connection), and each has a slightly different way of performing the execution. Naturally, since they are all completely different objects, one would also expect them to have different overheads. Add to that the fact that each object supports more than one way of performing queries, and the investigation became quite complicated. We must examine the performance of each of the two types of queries using each of the ADO objects and using as many different querying methods as possible for each object!!
Incidentally, although this article (and site) is focused on ASP development, the information presented here will probably be useful to developers querying Access databases in other languages which use the ADO objects, such as VBA, VB, C++, and J++.
Testing Methodology
I decided to divide my testing into two separate benchmarks - one for non parameterized queries, and one for parameterized. Following Access benchmarking tradition, I chose NORTHWIND.MDB (the example "Northwind Traders" database that comes with Access) as my test database and used two queries already present in the database - the "Ten Most Expensive Products" for the first benchmark and "Employee Sales by Country" for the second (with parameters of 01/01/1980 and 31/12/2002 which were arbitrarily chosen in order to select the entire range of results). Each benchmark executes the queries in as many different ways as possible, outputting the results for each query method separately. In order to refer to the methods in a consistent way I labeled each method with a two-character identifier (a numeral for the object type, and a letter for the query syntax used for that object).
Since the two queries have completely different execution times, the first benchmark performed 2000 loops per method, and the second only 200 per method. These numbers were chosen as good tradeoffs between accuracy of results and benchmark running time. The fact that a time from one benchmark cannot be compared to a time from the other does not matter, since any such comparison would be meaningless anyway because each benchmark executes completely different queries.
I would ideally have liked to perform stress-testing analysis on the different methods in addition to speed-based benchmarks, but unfortunately, I don't have access to an appropriate testing environment. Should you want to find out more about this, there are plenty of articles available on the Web, including this one be Microsoft (which uses Microsoft's freely available WAST stress-testing tool):
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnserv/html/server092799.asp.
Code
When performing a benchmark, be careful that the code does not compromise the results by introducing errors, biasing a particular method or otherwise affecting the outcome. Below are the steps that I took to try to ensure that my tests were fair and my results valid:
- Four separate core functions are required to cover all the querying methods. In each, I have intentionally made these less modular in design than usual. In each case the database-querying code is executed directly within the loop, rather than as a function being called from within the loop. This approach eliminates any latency incurred by calling functions.
- Within this loop, the required database objects are created, used, and destroyed once per loop (no objects are reused between iterations in the loop). This ensures that the full "cost" (time) of the query is measured; to do otherwise would be not be a fair test. The only exception to this rule is the Connection object (which is created and destroyed only once per benchmark script). It is not fair to include it in the cost of a query because queries always need an active connection. Besides which there are a multitude of ways to open a connection to an Access database, which may well be worth investigating in another article. In addition, if the opening & closing time of the connection were to be included, the connection caching system that IIS/ADO implements may affect the results, which is to be avoided.
- As soon as the query has been executed, the next lines close and destroy the returned RecordSet object, since this is yet another aspect of DB querying which has many different methods (with varying execution times), and is already relatively well-documented elsewhere. Not retrieving the data excludes this factor from the experiment, allowing the query execution speed to be measured more reliably.
- The statements directly before and after this loop capture the time so that nothing else can happen in between (which might otherwise influence the results).
- Finally, "Option Explicit" is set and all variables are dimmed accordingly.
It is important to note that since the timing was performed by using the VBScript Timer() function, there is an inherent accuracy limit of approximately 4 milliseconds. If I were timing each individual query, this would be a significant source of error, however due to the high number of iterations and the fact that it is the execution of the entire loop that is timed instead, this error is insignificant and shall therefore be assumed to be zero. This could perhaps be virtually eliminated by using a custom or commercial timing component, but I wanted to make sure that others would be able to execute this benchmark without the need for any component installation or system configuration changes.
Please note that these benchmarks are performed on an Access 2000/XP database (which uses JET4) using an OLEDB JET4 driver, since other versions and connection methods are recognised to be significantly slower in many respects, and are now virtually defunct.
The full benchmark code, database, and spreadsheet used to perform the testing can be downloaded as a ZIP file here
Results
Test machine configuration:
- AMD Athlon T-Bird @ 1.4GHz + 768MB SDRAM
- Windows XP Pro
- IIS 5.1 + VBScript 5.6 + ADO 2.7
- Access 2000/XP (JET4) format Northwind.mdb database + OLEDB JET4 connection driver
This machine acted as both client and server, and the machine was rebooted and all unnecessary tasks closed before benchmarking, with the system running at 99% idle (100% idle is impossible to achieve) before and after running the benchmarks.
Benchmark 1 (plain queries, no parameters, executed 2000 times each):
Benchmark 2 (queries with parameters, executed 200 times each):
Notes:
In order to stabilize the results, each benchmark was run an extra two times before the results recorded above. This ensured that any effects of caching & indexing (and anything else that might need to be initialised) were reduced as much as possible.
Be aware that these benchmarks will severely strain your Web server for a couple of minutes each time they are run, so don't run them on a live/production server!
Analysis of Results
In the first benchmark, the slowest query was method 2a (7.300 secs). Method 1a (the method that most scripts and/or developers use to query Access databases) ranked third slowest at 6.780 secs, which comes out at 7% faster than 2a. The fastest method was 3d (4.691 secs) - executing a Stored Query using the Connection object with the EXECUTE syntax. The difference between these fastest and slowest times is 2.519 seconds, which means that using method 3c instead of 1a results in a performance increase of 36%!! That's more than a third faster!
In the second benchmark, the slowest was 3a (8.116 secs), with the traditional method 1a (8.046 secs) coming in second slowest, and just under a measly 1% faster! The fastest method this time was 2e (7.477 secs), with a difference of 0.639 seconds compared to the slowest. This is clearly much less than in the first benchmark, but this still translates into a performance difference of 8%, which is still quite significant!
Also, examining the second benchmark's results for 2b & 2c, 2d & 2e, and 2f & 2g (the two methods in each pair differing only by the way in which the query parameters are passed) reveals that using the Parameters collection instead of appending parameter values to the end of the stored query SQL statement results in a relatively minor performance improvement of about 100ms on average, a result that I found surprising. I must admit that I had been expecting to see a more significant performance from using this collection - perhaps the difference might be more marked for queries which require a greater number of parameters (I may investigate this in the future).
Conclusion
There are two main conclusions that can be drawn from these results.
- First, if you are executing the same dynamic queries every time in your script (i.e. without any variables), significant performance gains (in this case, 36%) can be achieved simply by switching to Stored Queries. With this comes the aforementioned fringe benefit of increased ease of query administration. For queries which do contain variables, there are still performance gains to be had, but they are less significant (8% for the query tested).
- Second (and contrary to my own expectations), using the Command object's Parameters collection in preference to simply appending the parameter values to the end of the stored query SQL statement yields only a minor performance increase. In practice it may be worth completely avoiding this collection, especially since it is since the extra complication that it adds to your scripts (i.e. creating parameter objects, defining the data type, setting the value, and then adding the parameter object to the collection) may not
However, as with all speed-up recommendations that you read, don't just take my word for it; run the benchmark on your own test server and see for yourself. Also, be aware that the results are likely to vary depending on complexity of the query you are executing and the number of the records in the tables used by the query. However, in tests I performed on the other queries using the benchmarks above, I still achieved performance increases of in the range of 30-40% for the first benchmark, and 5-10% for the second. It is clearly important that you test your own database queries with the benchmarks before you implement them in your scripts. Use whatever works best for you.
Other Items of Interest
It is worth mentioning that (just like everything else in an Access database) stored queries can be added, modified, and deleted programmatically via the ADOX objects. Here's a simple function I have written to make adding one as easy as possible:
'Adds a stored query to the database
'Example usage: Call AddQuery(myConn,"CustomerNames","SELECT ID, Forename, Surname FROM Customers")
Sub AddQuery(Conn,QueryName,QuerySQL)
Set Catalog = Server.CreateObject("ADOX.Catalog")
Set Catalog.ActiveConnection = Conn
Set Cmd = Server.CreateObject("ADODB.Command")
Cmd.ActiveConnection = Conn
Cmd.CommandText = QuerySQL
Catalog.Procedures.Append QueryName,Cmd
Set Catalog = Nothing
Set Cmd = Nothing
End Sub
See the ADO SDK documentation, the Jet SQL reference in the MS Access help file, and the source of the parameterized queries in the Northwind Traders database for more information and examples of Stored Queries.
Further Reading and Research
Before I started this investigation, I was able to track down a couple of other articles which mention using Access Stored Queries, but they do not cover the performance aspects that I have investigated here. However, they may make interesting follow-up reading:
- Parameterized Queries in Access
http://www.4guysfromrolla.com/webtech/021799-1.shtml
Although the author makes the claim that querying using stored queries is faster that dynamic queries, it is not backed up with any figures, and only uses method 2e.
- Running Stored Queries in Access Database
http://www.stardeveloper.com/articles/010701-1.shtml
This is a walkthrough guide covering how to use Access stored queries in an ASP page. It covers only calling method 5, but does not make any performance claims, and hence features no figures. I highly recommend this article for those who are unfamiliar with Access stored queries.
The best source of reference material for ADO objects and methods is of course the ADO documentation itself, which can be downloaded (as part of the ADO 2.6 SDK) from http://www.microsoft.com/data/download_260SDK.htm .
Of course, there are a multitude of other performance-related topics and techniques which ASP developers should to be aware of - including:
- the database connection method you use
- the way in which you retrieve the query results from the RecordSet (using GetRows & GetString can bring phenomenal speed increases in some cases)
- RecordSet cursor and lock types
- creating objects late and closing them early
- how you use Application & Session collections
- writing high quality VBScript code
- construction of your database
This brief list is by no means exhaustive, so get reading!!! Particularly good sources of information include 15 Seconds, adOpenStatic, MSDN, 4GuysFromRolla and LearnASP.
Final Words
I hope that this article has been enlightening and is successful in convincing more developers to use more efficient database querying methods when using Access databases, as well as promoting the use of Stored Queries. Most developers don't seem to realise that there are different performance penalties for each ADO query-capable object, especially not as significant as a 33% difference!
If anyone does perform proper stress-test analysis on these different querying methods, I would be very interested to hear the results! However, you can't simply stress-test the benchmarks (they are not designed to be executed in parallel) - new code designed for stress-testing would need to be written, but it wouldn't be particularly complicated. Also, if you get any extreme results (i.e. amazing speedups, or no change at all), please let me know!
About the Author
Marcus Tucker is currently developing a student-oriented Internet service that he conceived while studying at Southampton University. It is due to go live in late autumn. Based in the UK, he has seen the Internet grow and evolve rapidly during his lifetime and has been developing for the Web since 1996. His current field of expertise is ASP & VBScript (which he has over 5 years experience with) and he is a frequent poster at www.sitepointforums.com, where he enjoys solving problems and helping others learn about ASP, XML, and DHTML in addition to more general programming techniques.
Marcus can be reached at info@marcustucker.com, and welcomes all feedback on this article and related topics.
|