We usually think of data in sets of related values, but not always. Occasionally, you may need to supply someone with a set of unrelated records, retrieved completely at random. Perhaps a manager wants to predict future growth or an auditor is looking for anomalies. Now, it would be nice if Access offered a random query option, but it doesn't. If you are using Microsoft Access, the good news is that a random set of records is just a simple expression away.
This blog post is also available in PDF form in a TechRepublic download, which includes a sample database you can use to try the techniques mentioned.
Access' Rnd() function is at the heart of this random-selecting technique, but it doesn't retrieve random records. Using the following syntax, this function returns a random value between 0 and 1 where value is a numeric value or expression:
At this point, you might be wondering if the returned value is truly random. No, it isn't, but it's random enough to satisfy most needs. Access maintains a list of random values and a seed value. When Rnd() retrieves a random value, the seed value bookmarks that value. That's how Rnd() knows which value to grab next; Rnd() finds the seed value and grabs the next value in the list. (That's a very simplified version of what really happens.) The seed value isn't the only factor Rnd() considers. Knowing how Rnd() responds to the type of value you pass is the key to success:
- When value is greater than 0, Rnd() returns the next random number.
- When value is less than 0, Rnd() returns the same random number, based on value. If value occurs only once, you won't notice this behavior. Access also resets seed, which means the sequence starts all over again.
- When value is equal to 0, Rnd() returns the most recently generated random number.
The simple query in Figure A uses the following expression to return a random value for each record:
Rnd() works as you might expect -- mostly. Each value greater than 0 returns a seemingly random and different value. For instance, the value 1 occurs more than once in the natural data, but Rnd() returns different random values for each. What you might not expect is that values less than 0 generate the same random value. Each occurrence of -3 and -8 generate the same random value.
Rnd() behaves differently for positive and negative values
Notice the last two records, where FieldValue is -3 and 1. Those values occur earlier and Rnd() returns the same random value for both, in sequence because -3 resets the seed value. This is noticeable only because the values repeat in sequence, but it's noteworthy.
Selecting random recordsNow that you know how Rnd() generates random values, you can put the function to work retrieving records, at random. To do so, simply sort the records by the random values, as shown in Figure B. Generally, it won't matter whether the sort is ascending or descending. At this point, all the records are still present, you've just sorted into what seems a random order, based on only the natural data.
Sort by random values
To select a specific number of records at random, use the query's Top Values property. For instance, to random select five records, you'd set the Top Values property to five, as follows:
- With the query in Design view, right-click the background in the upper pane and choose Properties from the resulting submenu.
- Select five from the Top Values settings, as shown in Figure C. Alternatively, simply enter it.
Set the Top Values propertyAs you can see in Figure D, the query returns six records instead of five. That's because 0 and the previous value produce duplicate random values. That's just one of the reasons it's important to understand how Rnd() generates values.
A Top Values setting of 5 returns six records because the fifth and sixth records have a duplicate value
Now, close the query and rerun it. Notice that Access returns the same records. That's because the negative values reset the seed value. That won't always happen. If Rnd() only encounters values greater than 0, Rnd() will generate different values each time you run the query.
Solutions for Rnd() problems
Repeating random values might not be a problem. However, when they are you need to identify them and work around them. There are really two problems:
- Eliminating repetitive random values.
- Forcing Rnd() to generate a new set of random values when the natural data contains values less than 0.
The easiest solution solves both problems. Simply eliminate the negative values using the Abs() function as follows:
RandomValue: Rnd(Abs(FieldValue))Wrapping the passed value in Abs() allows Rnd() to treat all values as positive. That eliminates repetitive values within the query itself and in the next run. It won't however, eliminate the repetitive value returned by 0, as shown in Figure E.
Another problem is that Access begins each session with the same seed value, which negates the random quality when you repeat the process, the next time you open Access.
A value of 0 still repeats the most recently generated random valueIf you must maintain the negative values in your random task, or if the natural data doesn't have a numeric field, use the user-defined function (UDF) in Listing A. It uses the Randomize function to reset the seed value for each value. In other words, the function starts with a new random seed value each time Access executes the function. Then, the function simple returns the value of Rnd(1) to the calling expression.
Public Function GetRandomValue(fld As Variant)To create the function, launch the Visual Basic Editor (VBE) by pressing Alt+F11. From the Insert menu, choose Module. Enter the GetRandomValue() function, save the module, and return to Access. To call the function, open the query in Design view and enter the following column expression, as shown in Figure F.
GetRandomValue = Rnd(1)End Function
Be sure to set a sort order (it doesn't matter which) for the RandomValue column. The example uses Ascending.
Call GetRandomValue() from a queryThe resulting query, shown in Figure G, doesn't suffer from the Rnd() function's peculiar behavior regarding negative values and 0. That's because the query calls the UDF once for each record and each time, the Randomize statement resets the seed value. In addition, this flexible UDF doesn't care what value you pass. The example passes FieldValue, but the UDF doesn't require a numeric field.
Use a user-defined function to eliminate repetitive values from a random set
Knowing how to use Rnd() is the key to getting a truly random selection, without any surprises. While Rnd() doesn't return a true random value in the cosmic sense, it will satisfy most random task. Keep in mind that the nature of random values allows for repetitive values. However, it's your job to discern the difference between a truly random, but repeating value and a repetitive value that Rnd() generates on purpose.
Susan Sales Harkins is an independent consultant and the author of several articles and books on database technologies. Her most recent book is "Mastering Microsoft SQL Server 2005 Express," with Mike Gunderloy, published by Sybex. Other collaborations with Mike Gunderloy are "Automating Microsoft Access 2003 with VBA," "Upgrader's Guide to Microsoft Office System 2003," "ICDL Exam Cram 2," and "Absolute Beginner's Guide to Microsoft Access 2003" all by Que. Currently, Susan volunteers as the Publications Director for Database Advisors at http://www.databaseadvisors.com. You can reach her at firstname.lastname@example.org.
Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.