The declining importance of performance testing should change your priorities

In this document you'll learn why the declining importance, and therefore the limited gain, of trying to determine the performance and scalability of systems should reduce the priority you're placing those activities.

We've reached a day when a fast server can handle almost any single load that is thrown at it. For a few thousand dollars, perhaps $5K to pick a round number, you can have a system that will perform adequately to almost any load. Because of this performance testing is becoming a luxury that few can afford.

Of course, there are some organizations that will find that their applications can not be served on a single server or even a small farm of servers. Their tasks will be so large, difficult, or expansive, that no single piece of hardware can solve the need—however, the number of organizations with those needs is dramatically shrinking.

In this document you'll learn why the declining importance, and therefore the limited gain, of trying to determine the performance and scalability of systems should reduce the priority you're placing those activities.

Personal performance history

I can vividly remember performance testing an application I was writing. It was a program designed to read the input from up to eight telephone switches simultaneously. Each switch was communicating with the application at 9600 baud.

The application had to verify the record format, write valid records to one file and invalid records to another file. The other thing it had to do was fail over from the network share that it was recording the files to on to a local drive if the network went down. Back then the network was down for 15 unplanned minutes every month.

The system was running on a 25Mhz 80486SX machine running DOS. It could handle all eight switches flat-out—unless there was a problem. Its peak load during a network failure was five switches. That's well more than the design goals—it needed to be able to support 3-4 switches flat-out. That would have been a several hundred percent increase in traffic the company was currently experiencing.

Back then the extra processing that an error caused actually caused a snowball effect that caused the system to degrade rapidly. The first lost character would cause extra processing which would cause at least one more character to be lost.

It was necessary to test the performance of the system because if the system didn't perform it meant lost revenue. However, even then the performance of the hardware was starting to pull away from the performance goals that an average company would have.

Why do performance testing?

Performance testing is done for two reasons. The first reason is to ensure that the system will meet the current and short term projected needs of the business. It is to establish how much performance can be extracted from the system as it exists today.

The second reason is to plan for when something must be done in order to support a greater load. This may include rewriting portions of the solution, restructuring the solution, or adding more hardware. Doing performance testing here helps you understand the effort that may be necessary to move past the performance limits of the system as it exists today.

What is performance testing?

When we're talking about performance testing, we are really talking not just about a single thing called performance testing. Rather we're talking about responsiveness, throughput, and scalability. These three items are inseparably linked. They can not, be evaluated except in relationship to each other.

Responsiveness is easy to quantify. Anyone with a stop watch can quickly determine how long something takes to respond. Of course, even simple automated testing will be more precise, but the fact remains that the responsiveness of an application is the easiest component of performance testing to monitor. It's directly observable—at least at low levels of throughput. The time between the request and the response defines the responsiveness of the system.

Throughput is the measure of the number of transactions that the system can manage simultaneously. There's a specific limit to throughput based on the type of the transaction. A given system may only be able to complete 20 transactions per second. No matter how many transactions that you throw at the system it will complete no more than its maximum throughput in a given period of time.

Where responsiveness measured how many seconds it took to complete a transaction. For instance, say that responsiveness is measured at three seconds per transaction. Throughput flips this measurement on its head. Throughput is measured in transactions per second. Putting it all together, if the system completes a maximum of 20 transactions per second and the system continues to be responsive at three seconds per transaction then at any given time there can be 60 transactions being processed on the system.

One of the reasons why responsiveness and throughput is linked is because as the number of transactions flowing through the system increases the responsiveness of the system decreases. Transactions that used to take less than one second take two or three or more seconds to complete. Perhaps more frustrating is that generally beyond a certain rate of transactions the overall throughput drops. If the system is capable of handling 20 transactions per second, but the transactions are coming in at a rate of 25 per second, the overall transaction handling capacity may fall back to 17 transactions per second. This happens for the same reason that my switch reading program started loosing more characters. The additional processing caused by the additional transactions begins to cause timeouts, retransmissions, and other sequences which reduce the overall throughput.

Scalability is the measurement of how large the throughput of the application can be made. Once the maximum throughput is determined the next step is to determine how much that throughput can be increased by various changes to the environment. Scalability directly related to bottlenecks. A bottleneck is a part of the system which has reached its maximum throughput. It is the part of the system which is constraining the system from reaching better performance. Scalability options are generally very effective when they are directly related to a bottleneck. Bottlenecks generally fall into the categories of: processor, memory, disk, or network speed.

In order to determine effective scalability options you must first identify the bottleneck responsible for limiting throughput and then work to eliminate that bottleneck. For instance, if the bottleneck is the processor, the test would be to evaluate the impact of adding another processor. While one can rarely expect throughput to double for every doubling of resources, a high correlation on the order of 0.8 is reasonable—if there isn't another bottleneck being masked by the first.

Bottlenecks only become apparent when they are constraining the performance of the system. If the processor is constraining the throughput of the system then it will be exceedingly difficult to detect that the installed memory will only support another 10 percent growth in transaction throughput. The net effect is that finding one bottleneck doesn't necessarily accomplish the goal of identifying scalability options. Generally it is necessary to actually demonstrate the performance of the proposed scalability options to demonstrate the actual improvement.

Here in lies the fundamental rub of performance testing. To do the testing correctly you have to have the hardware that you're trying to test scalability for—and at that point you've already bought it.

Evaluating performance

In today's world most applications don't demand performance that exceeds the capabilities of a single system. For instance, Microsoft estimates that a dual processor 2.4 GHz Intel Pentium 4 running ISA Server is capable of encrypting between 30 and 42 megabits per second of SSL traffic. Given that most organizations have no more than a standard T1 (D1, if you prefer) at 1.544 megabits per second connecting their organization to the Internet, there's a huge gap between the capabilities of systems today with the average demand. While most organizations focus on things like SSL encryption and other known difficult tasks for computers to perform, few realize the gap that exists between the capabilities of the systems and the load that they're discussing.

The discussion typically centers on a technique or set of techniques which are known to consume more time. For instance, in .NET using reflections to manipulate objects takes longer than making direct calls to the object. (Reflections are a mechanism for accessing object metadata at run-time.) This is a statement of fact. However, what it misses is that reflections, while taking several times the amount of processor time as direct methods, still won't scratch the surface of the available processor time in most applications.

Similarly, an exception handling approach of wrap and rethrow is dismissed because it is very processor intensive. Exception handling is expensive in general because it causes a stack walk to find the appropriate handler. The wrap and rethrow technique can generate potentially a dozen or more exceptions for each single bottom level exception that is directly raised by an error. It may be a few hundred times more expensive from a processor perspective than a single exception, however, this ignores the fact that exceptions do (or at least should) happen very infrequently and the information captured with a wrap and rethrow technique is substantially more useful when troubleshooting a problem.

Another area where performance comes up, and where it is of limited value, is when talking about stored procedures (in a Microsoft SQL Server database). Some organizations require stored procedures for every table access but not for the security benefits that it can provide; instead they are focused on the performance improvement to be gained through the use of the stored procedure. However, in most situations the use of stored procedures provides little or no performance improvement. The parts of the process that are being skipped (compilation and query optimization) are so efficient in SQL server that they effectively take no time.

The list of items like this, where there's a performance focus at most mid-size and large organizations, seems to be endless. While keeping conscious of performance is important because you can become so wasteful with resources that you do cause a performance problem, these situations are most frequently about writing solid code more than they are about using one technique or another.

Hardware is cheap

While it's possible to do performance testing – and do it well – it's generally an expensive, difficult, and time consuming process that rarely returns value on the investment. In most organizations it's easier and cheaper to add more memory, another processor, or another server to enhance performance.

Similarly, focusing on minor items which impact overall performance (utilization) may cause an inappropriate focus on design ideas that provide minimal performance in exchange for a substantial amount of development time. It pains me to say that hardware is cheap, utilize it.