Developer

Learn to handle sparse data with ColdFusion

Often tabular data created by a database query contains empty cells. These cells of sparse data can play havoc with your ability to output meaningful information. Using this technique, ColdFusion can overcome the problems created by sparse data.


When I say "sparse data," I mean outputting tables of data that have holes in them. As you can see in Table A, some cells are empty. Often when you need to output tabular data, the data comes in a fully-populated query. You just loop over the query, outputting each cell and each row. Sparse data, like the data in Table A, can mean that the query isn't fully populated; that some recordset elements are not present.
Table A
Reporting period
Canada
US
Europe
South America
Asia
Moon base
Mars base
Yuggoth operations
Jan. - Feb.
14
25
25
17
17
17
17
March - April
22
22
19
19
19
May - June
19
29
34
29
22
22
22
22
July - Aug.
18
34
30
34
22
22
20
Sept. - Oct.
39
29
39
18
22
Nov. - Dec.
16
30
28
30
25
25
25
25
Sparse data

To further complicate things, what if the number of columns is not constant, but variable? And what if the number of rows is also until runtime? When applied to Table A, this would mean that you don't know in advance what the regions or what the reporting periods will be. Listing A shows a Querysim that will generate a recordset that exemplifies how such data might be returned.

You'll notice that not every sales performer has sales data for every reporting period. How could you handle this and still generate a full table?

Additional information
You can get the handy and free <cf_querysim> tag at Hal Helms Web site. It lets you generate your own query recordsets.

Handling sparse data: Take one
The first step is to consider using three queries instead of just one. The first query tells you what the columns are, in the order you want them. A second query will tell you what the reporting periods are, in order as well. And the third query will give you the sales data in a recordset like the one generated by Listing A.

You can then loop over the columns query to make sure that every column has a cell in the table, whether it is empty or not. Same goes for the rows—you can loop over the rows query and you know that there will be corresponding rows in the output table, even if the entire row had no data in the sales data recordset.

Listing B shows how this can be done with some nested loops.

First, you'd loop over the columns query to output all the column headers, in this case each sales performance area. Then, you'd start looping over the reporting periods query. Within each reporting period, you'd loop over the sales performers again. For each combination of reporting period and sales performer, loop through the sales data query looking for a matching record. If you find one, output it and break out of the loop. If you don't find one, output an empty cell instead.

This technique outputs the data just as it should. So you're done? Not quite. A little testing reveals something scary: This code is performing 1,109 loop iterations! Sound a bit excessive for a data set with 8 columns and 6 rows? It should. But consider this: If the recordset grows to 16 columns and 6 rows, the number of loop iterations jumps to 4,514. As the data set grows, the number of loop iterations increases at a geometric rate. Clearly this solution is not scalable.

Handling sparse data: Take two
One way to greatly reduce the number of loops required is to use a structure to hold the sales data. Take a look at Listing C.

This example creates a structure where each key is a combination of reporting period and sales performer, and the value is the sales data for that period and performer. The two values that make up the key are delimited with a pipe symbol.

Here, you'd loop over the performers and sales period queries again, but this time instead of doing another nested loop to find the matching sales data, you'll just look in the structure instead. This is shown in Listing D.

If you run this code on the query created in Listing A, you'll see that it also outputs the data exactly the way you need it. However, this code only does 48 loop iterations instead of 1,109 with the previous method. Doubling the number of columns runs 96 iterations instead of 4,514. In terms of execution time, using the structure is literally dozens of times faster, which is quite a stunning difference.

Useful constructs
ColdFusion structures are useful constructs to store collections of data. But by making the structure key dynamic and concatenating values to build the key, a structure can become an excellent, and fast, data storage and lookup tool. If you need to display sparse data like this, using a structure to avoid nested loops can be a useful technique.

Editor's Picks

Free Newsletters, In your Inbox