Shuffling a recordset so that it can be displayed in random order is very useful when programming dynamic Web pages. A database table containing a list of supported vendors or links to advertisers may need to be displayed in a different order each time so that none complain about always being last on the list.
Others have solved this problem by writing ASP code to reorganize the recordset by copying data into an array and then rearranging the data in memory (see http://www.asp101.com/articles/john/randomize/default.asp). An alternative solution is to have SQL arrange the recordset for you by using an ORDER BY clause on a calculated column added to the query. Unfortunately, the calculation necessary to produce a uniformly random recordset is not straightforward, but this article will review several approaches and point out one that works reasonably well. The examples were written using SQL Server 7.0.
The built-in random number generator (rand) provided by SQL might be a first approach to randomizing a recordset. The random number generated will be the same for each row of the recordset, unless you seed the random number generator with a different integer value. In the example below, the primary key (id) seeds the random generator.
SELECT rand(id) sortid,* FROM tblname ORDER BY sortid
Often your database is configured to generate the same set of random numbers from a given seed, so this expression gives predictable rather than random results. Most likely this approach won't do the job.
You might fix the deficiency in rand by generating a random seed in your ASP code and imbedding it in your SQL expression.
<%
Randomize
R=clng(1E6*rnd)
F=1+clng(1E3*rnd)
Query= "SELECT (id*" & F & "+" & R & ")%1000) sortid,* " &_
"FROM tblname ORDER BY sortid"
DBObj.Execute (Query, RC, 1)
%>
This will produce unpredictable, but not necessarily random, results. The problem remains that id is the only factor determining the relative order within the recordset.
The lack of uniform randomness is illustrated in the above frequency histogram. It shows the relative frequency that a particular record (labeled A through Z) is first in a 26-record recordset when the same query is executed hundreds of times.
An alternate approach is to calculate a psuedorandom number using the primary key and the millisecond and second values from the system clock.
SELECT (id*(1+datepart(s,getdate()))*(1+datepart(ms,getdate())))%1000 sortid,* FROM tblname ORDER BY sortid
The expression datepart(ms,getdate()) returns the millisecond value of the system clock, while datepart(s,getdate()) returns the seconds. The number 1 is added to both so that the factors are never zero. This random clock factor is then multiplied by the primary key (id). Finally, using the modulo expression %1000, just the last three digits of the result are selected to eliminate the order inherent in the id value itself.
With a fast machine and an imprecise clock, each record is computed with the same value for the millisecond value. This technique produces unacceptable results as is obvious from the following skewed frequency histogram.
This third approach improves the randomness of the results by using other psuedorandom database fields like last update date, part numbers, or numeric values, in place of the system clock factors. For example:
SELECT (997*id+datepart(s,LastUpdate)*datepart(ms,LastUpdate)+datepart(ms,getdate()))%1000 sortid,* FROM tblname ORDER BY sortid
The prime number 997 is factored into the calculations to ensure that the calculations produce numbers larger than 1000. The uniformly distributed histogram shown below proves that this calculation produces a better random recordset compared to the previous examples.
When developing a formula to produce random numbers from data already in the database, use both addition and multiplication to maximize the randomness. Devise the formula with a large range in random number value (e.g., 0 to 1000) so there are few repeating values. Repeated random numbers in a set will give precedence to some records, usually those stored first in the database.
Whichever formula you choose to use to produce random numbers, you should test the results for uniformity. A database with a different range of values could produce nonrandom results from a formula that worked great in a previous situation. Here is sample ASP code that will run a query multiple times and then display a frequency histogram like the ones shown throughout this article.
<%
Response.Expires=-100000 ' This is to prevent browser caching
' A connection string and query are specified below
Set DBObj = Server.CreateObject("ADODB.Connection")
DBObj.Open "PROVIDER=SQLOLEDB.1;DATA SOURCE=xxx;UID=SA;PWD=;DATABASE=DBxx"
Query="{{place your query here}}"
%>
<HTML>
<HEAD></HEAD>
<BODY>
<table cellspacing=2 cellpadding=0 border=0>
<tr>
<%
Dim Freq(100) ' Adjust this number if your ids are > 100
For I=1 to 500 ' The query is run 500 times
Set RS = DBObj.Execute (Query, RC, 1)
id=RS(0) ' first column in the recordset is a unique id key
Freq(id)=Freq(id)+1
RS.Close
set RS=nothing
Next
' Display the histogram. Supply your own small colored gif (ColorSwatch.gif)
' to improve chart appearance
For I=1 to 99 ' Adjust number to match your range of ids
Response.Write("<td valign=bottom><img src=ColorSwatch.gif border=1 " &_
" width=10 height=" & Freq(I) & "></td>")
Next
DBObj.Close
Set DBObj=nothing
%>
</tr>
</table>
</BODY>
</HTML>
Recordset ordering can sometimes be more difficult than these simple random rearrangements. In one project, a customer wanted all records in a specified ZIP code to float to the top of the otherwise randomly ordered list. To complicate matters, the list needed to have static values that were not in the database mixed in with the dynamically generated recordset. Here is one solution:
SELECT 1 sorttype,id,rand(id) sortid FROM tblname where zipcode=20164
UNION
SELECT 2 sorttype,id,rand(id) sortid FROM tblname where zipcode <> 20164
UNION
SELECT 2 sorttype,923,rand(id) sortid FROM tblname where id=78
ORDER BY sorttype,sortid
In this solution, several recordsets are combined using the SQL union command. A calculated field (sorttype) is used to give precedence to records with a ZIP code of 20164. The static record is merged into the resulting recordset by entering the values instead of field names. The resulting recordset is randomly reordered using the id field. The static record reuses an existing record's id (but not its data) so that it is properly randomized.
The actual random number calculation for sortid has been omitted and rand(id) is used as a placeholder. As explained above, rand(id) likely will not produce a uniformly distributed random sequence, and you should replace it with a calculation that does.
Randomizing a recordset is tricky because your first approach may not produce uniformly random results. However, with the frequency histogram tool provided and a few examples on how to calculate random numbers from existing database data or the system clock, you should be able to randomize your recordset directly in your SQL expression. More elaborate recordset randomizing can be achieved by using UNION to join several queries together.
UPDATE
Several readers (thanks!) pointed out the NEWID() function in SQL 7 and SQL 2000 produces a more uniform set or random numbers that can be used to order the recordset. This simplifies the SQL to something like this:
SELECT id,* from tblname order by NEWID()
Here's the frequency histogram to show uniform distribution.
