I was playing Warcraft 40K: Dawn of War (correction 6/24/06: I mean, Warhammer 40K: Dawn of War) a few days ago, and noticed something intriguing about the victory screen’s statistics: while numerically accurate, they presented an incredibly misleading picture of how the game played out. The victory screen would bold the number in each category that it deemed to be “best,” but in reality, what they called the “best number” was not necessarily so. Numbers and graphs are a very powerful way of presenting information, and sometimes the only way of presenting information, but without the proper thought going into how the information is presented it is useless or even misleading.

What Warcraft 40K (correction 6/24/06: again, Warhammer 40K) did wrong, in this case, was to make assumptions about whether a high number was better or worse. For example, the player who lost the least number of units was deemed the “best” in the category. However, that player was knocked out of the game very quickly! In other words, they lost 10 units and I lost 80, but the 10 units they lost were every unit they had, and the 80 I lost was a small percentage of my overall unit count. To put it simply, the units lost count lack an inadequate baseline to be compared against.

Here are just a few examples of how the data you present to your users can be quite inaccurate in terms of what the consumers wants to know while being numerically accurate at the same time.

To use some more tangible, real world examples, imagine that you are viewing a sales report. You see that a particular sales person sold $500,000 worth of product. That sounds great, right? But what if the previous sales person for that area used to sell $2,000,000 worth of product in the same span of time? What if the competitor’s product is selling $5,000,000 in the same amount of time? What if the other territories have twice the sales, but five times as many accounts? Again, without a proper comparison, the simple sales number of $500,000 is useless or misleading.

Another example of this is stock market indexes. If the DJIA drops 400 points in one day, is that nothing to worry about, or should you start hoarding food? Well, it all depends on what the percentage of the drop is from its starting value. If the starting value was 1,115 then it is time to head for the hills. But if the starting value was 10,516 then 400 points is a much smaller problem.

Even with a baseline presented, how do you know that the baseline is appropriate? Take market share, for example. What is the relevant way to measure market share? Is it by units sold, or by dollar amount sold? Let us compare two products. Product A has a price of $10 per unit, and Product B has a price of $5 per unit. If Product A sells 500 units, and Product B sells 750 units, then a unit-based market share shows Product A’s market share as 40%. But a dollar amount market share shows Product A’s market share as 57%. That can be some pretty useful math, being able to increase market share by 17%, simple by changing your definition of market share. But wait, let us add in one more piece of information: one unit of Product A has the efficacy of five units of Product B. In other words, you need to buy five times as much of Product B to do the job of one unit of Product A. Let us re-write our market share calculations in terms of “efficacy sold.” Product A sold 500 units of efficacy, but Product B sold 250 units of efficacy. Now, Product A’s market share based on simple efficacy sold is 66%. So, what really is the best way of calculating your market share in this situation? In this case, I would argue in favor of an efficacy-based market share.

Percentages are another dangerous area. “Percentages can be tricky?” you ask. Yes, they can be. The value of a percentage all depends upon the relevancy of the total that is being used for comparison. For example, if I say “I am almost 20,” most people will assume that I am only a month or two away from my 20th birthday; after all, most people measure their distance from a certain age based on a 12 month time span. But what if I just turned 19 last month? By the standard measure of “almost” a certain age, I am only 8% of the way to the age of 20. But compared to how long I have lived, I am 95% of the way to the age of 20. Even a number as simple as a percentage can be misleading.

In the same vein, averages can paint a different picture of reality too. Averages have a way of reducing outliers to the status of irrelevant when the data set gets large enough. Imagine that you are managing a call center, and you see an average hold time of thirty seconds. That sounds good. Now, what if you later discovered (deep diving after numerous customer complaints) that what was really happening was that for most of the day, the hold time was zero seconds, but during peak hours, the hold time was actually five minutes. Now your average no longer looks so great; it disguised your need for additional workers doing peak hours.

Another dangerous data trap are time span comparisons. Take a toy store as an example here. If you compare Quarter 1 2006 sales to Quarter 4 2005 sales (current quarter vs. previous quarter) then you will be panicking every year because you will see a huge drop in sales. But if you compare Quarter 1 2006 to Quarter 1 2005, you are able to see a proper “apples-to-apples” comparison. On the other hand, for the sales for a non-cyclical product such as medication for chronic diseases, a current quarter vs. previous quarter comparison gives a more accurate snapshot of sales trends.

What can you, as a developer do about this?

Your goal is to serve your end users’ need for accurate, reliable, and understandable information. The first thing you should be doing is to always explain the methodology used in language that the vast majority of your users can understand. You can use technical mathematics language if statisticians are going to be using your data, but you cannot do this if your data is headed for a major news Web site. Another thing you should do is to put yourself in the shoes of your end user, or ask them what they need to know. Learn about the numbers. Find out what the numbers actually represent, and the typical data trends for that data.

Typically, first order calculations such as a raw sales number are not very useful. Second order and third order (velocity and acceleration) calculations tend to provide much more data. It is one thing to say that a car is 800 feet down a quarter mile race track (first order number, distance). It is more useful to state that its current speed is 80 MPH. It may be even more useful to say that its acceleration is 9 ft/second2. And if you are the race car mechanic, you probably want a plot of the instantaneous velocity and acceleration throughout the race, to find out where the vehicle’s performance needs to be turned for maximum race speed. Meanwhile, the fans just want to know the total time and trap speed.

This brings up the issue of graphical representations of data. We all like charts and graphs. My boss likes to joke with me and ask, “Can I get that as a pie chart?” In reality, many graphs are not as useful as they may appear. Like raw numbers, a graph without an appropriate baseline is not very useful. A chart of current sales figures for a year is good. The same chart showing the total market size and competitor’s performance lets the viewer see if the trend is specific to their product, or the market as a whole. A line chart showing a second or third order derivative over time is excellent (for example: market share over time, change in market share over time, etc.).

You also need to decide how you are going to handle outliers. Are exceptional cases what your users are looking for? Or are exceptional cases to be discarded? Or maybe your users need you to remove the exceptions from the overall data set (so as to not mess up averages), but to point them out elsewhere. Again, you really need to work closely with your users.

There are all sorts of data traps for the unwary, and these are just a few of them. Be on the lookout, and do not be afraid to ask your users questions if you think something in the project spec does not make sense. Most users would rather have you say to them, “I know this is what you asked for, but I am looking at the raw data and I think we can devise a more useful metric” than to be using a product that gives them a misleading idea of the data due to a poor metric selection.

J.Ja