If load-testing results and capacity planning suggest that you need to increase the number of Web and application servers in your current infrastructure, you may be tempted to lob statistical salvos at senior management to prompt them to open the purse strings. Don’t. A better approach is to provide simple statistics and graphs that highlight critical points of failure and illustrate performance degradation that may dramatically affect the end-user experience and compromise brand status. Distilling key data will be more effective than teaching your boss Statistics 101.
Keep the stats simple
If you’re going to present statistics to senior management, keep them simple and always accompany them with a graphical presentation, said Kent Langley, manager of technology infrastructure at CNET, Inc.
“I use mean, median, standard deviation, maximum, and minimum. Those are some of the most basic statistics. Unless you get into fancy predictions, which are usually wrong, that’s all you need.”
Avoid analysis paralysis by keeping your statistics simple. Here are a few measures that should get you through most situations. Even if you don’t think you will need to use these, they’re worth a glance, just in case they are thrown at you.
Measures of variability
As the name suggests, these values give you a sense for how varied your data is. Obviously, the more consistent and less varied, the better.
- Minimum and maximum values—These are your highest and lowest values.
- Range—This is the difference between the high and low values.
- Standard deviation (and variance)—A detailed explanation is beyond the scope here, but this is essentially a value that indicates how much the numbers vary. Small is better. The variance is the standard deviation squared.
Measures of central tendency
These measures offer a single number that your data is centered around.
- Mean (actually the arithmetic mean)—This is a simple average of all of your values.
- Median—If you order each value collected from lowest to highest, this is the value of the number located in the physical middle of the list. This value can be useful if the mean is distorted by outlying data.
- Mode—This is the most frequently occurring value among your data.
Adrian Dorsman, a performance technologist for Web services provider Grand Central, agreed with the strategy of keeping the statistical analysis short and straightforward.
“The less information I need to relay regarding software optimization or a need for more hardware, the easier it is to communicate. One graph and an overview of how I got the data is what I strive for.”
Dorsman learned this lesson the hard way.
“When I first started, my docs were usually 20 pages long, with tons of information that I found quite valuable. However, nobody read it or comprehended what I was saying. It’s necessary to identify the most important data and focus on that. Leave the other stuff for conversation.”
Getting your point across
Even the most brilliantly assembled collection of data won’t win senior management’s approval if they nod off during your delivery.
“I mentioned the term ‘standard deviation’ in a meeting with senior managers and I could see their eyes becoming glassy,” Langley said. “I just wanted them to understand that you can’t build a site or service infrastructure based on averages.You absolutely must do your best to plan for the real dataset.”
He also noted that presenting complicated statistics might sometimes make management suspicious.
“If you try to explain that the data point falls within a statistically relevant distribution as related by how far away from the mean the point is in terms of standard deviations, they’ll think you’re trying to put something over on them,” he said. “If I make a chart or graph or PowerPoint presentation that I can point to and say below this number is bad and above this number is good, the point gets across more effectively.”
A quick example
Let’s say you want to build a buffer in your capacity so that your network can support unplanned spikes in traffic. Figure A shows load vs. time of day. Note that the numbers in the graph do not represent actual business data.
|Application server capacity|
If each instance of your application servers supports 500 session requests per minute and you have eight application servers, the capped maximum is 4,000 sessions per minute. Historically, the most sessions you have supported simultaneously is 2,500, and the averagedaily traffic is 1,500 at peak times.
Instead of diving into standard deviations and arguing with managers who want to focus on the average number of session requests per minute, you could point to the graph and say that the buffer from 2,500 to 4,000 allows you to accommodate spikes. Given a peak of 2,500 sessions per minute (based on historical data), you need at least six application servers just to meet minimum requirements if one fails going into a peak period. The additional two provide a buffer when the unannounced marketing campaign drives traffic past the historical peak. Put in those terms, what is at stake for the business becomes very clear.
“If you’re in a company that runs a TV campaign without telling the engineers and then all of a sudden traffic jumps by 10x one day, you can be caught with your pants down quite easily. This is where [initiatives] like Akamai’s value proposition become quite intriguing,” Dorsman said. (Akamai is a company specializing in enhancing the performance of Web content delivery.) “Perhaps some day, a Web services [architecture] may have a reserve of computing power on some distributed network to accommodate any massive peaks in system utilization. That would be an interesting technology, to say the least.”
Stats to the rescue
Have numbers ever saved the day for you? What suggestions do you have for getting the number across without putting your boss to sleep? Send us an e-mail with your story and suggestions or post a comment below.