We’ve
reached a day when a fast server can handle almost any single load that is
thrown at it. For a few thousand dollars, perhaps $5K to pick a round number,
you can have a system that will perform adequately to almost any load. Because
of this performance testing is becoming a luxury that few can afford.

Of
course, there are some organizations that will find that their applications can not be served on a single server or even a
small farm of servers. Their tasks will be so large, difficult, or expansive,
that no single piece of hardware can solve the need—however, the number of
organizations with those needs is dramatically shrinking.

In
this document you’ll learn why the declining importance, and therefore the
limited gain, of trying to determine the performance and scalability of systems should reduce the
priority you’re placing those activities.

Personal performance history

I can
vividly remember performance testing an application I was writing. It was a
program designed to read the input from up to eight telephone switches
simultaneously. Each switch was communicating with the application at 9600
baud.

The
application had to verify the record format, write valid records to one file
and invalid records to another file. The other thing it had to do was fail over
from the network share that it was recording the files to on to a local drive
if the network went down. Back then the network was down for 15 unplanned
minutes every month.

The
system was running on a 25Mhz 80486SX machine running
DOS. It could handle all eight switches flat-out—unless there was a problem. Its
peak load during a network failure was five switches. That’s well more than the
design goals—it needed to be able to support 3-4 switches flat-out. That would
have been a several hundred percent increase in traffic the company was
currently experiencing.

Back
then the extra processing that an error caused actually caused a snowball
effect that caused the system to degrade rapidly. The first lost character
would cause extra processing which would cause at least one more character to
be lost.

It was
necessary to test the performance of the system because if the system didn’t
perform it meant lost revenue. However, even then the performance of the
hardware was starting to pull away from the performance goals that an average
company would have.

Why do performance testing?

Performance
testing is done for two reasons. The first reason is to ensure that the system
will meet the current and short term projected needs of the business. It is to
establish how much performance can be extracted from the system as it exists
today.

The
second reason is to plan for when something must be done in order to support a
greater load. This may include rewriting portions of the solution,
restructuring the solution, or adding more hardware. Doing performance testing
here helps you understand the effort that may be necessary to move past the
performance limits of the system as it exists today.

What is performance testing?

When
we’re talking about performance testing, we are really talking not just about a
single thing called performance testing. Rather we’re talking about
responsiveness, throughput, and scalability. These three items are inseparably
linked. They can not, be evaluated except in relationship to each other.

Responsiveness
is easy to quantify. Anyone with a stop watch can quickly determine how long
something takes to respond. Of course, even simple automated testing will be
more precise, but the fact remains that the responsiveness of an application is
the easiest component of performance testing to monitor. It’s directly observable—at
least at low levels of throughput. The time between the request and the
response defines the responsiveness of the system.

Throughput
is the measure of the number of transactions that the system can manage
simultaneously. There’s a specific limit to throughput based on the type of the
transaction. A given system may only be able to complete 20 transactions per
second. No matter how many transactions that you throw at the system it will
complete no more than its maximum throughput in a given period of time.

Where
responsiveness measured how many seconds it took to complete a transaction. For
instance, say that responsiveness is measured at three seconds per transaction.
Throughput flips this measurement on its head. Throughput is measured in transactions
per second. Putting it all together, if the system completes a maximum of 20
transactions per second and the system continues to be responsive at three
seconds per transaction then at any given time there can be 60 transactions
being processed on the system.

One of
the reasons why responsiveness and throughput is linked is because as the
number of transactions flowing through the system increases the responsiveness
of the system decreases. Transactions that used to take less than one second
take two or three or more seconds to complete. Perhaps more frustrating is that
generally beyond a certain rate of transactions the overall throughput drops. If
the system is capable of handling 20 transactions per second, but the
transactions are coming in at a rate of 25 per second, the overall transaction
handling capacity may fall back to 17 transactions per second. This happens for
the same reason that my switch reading program started loosing more characters.
The additional processing caused by the additional transactions begins to cause
timeouts, retransmissions, and other sequences which reduce the overall
throughput.

Scalability
is the measurement of how large the throughput of the application can be made. Once
the maximum throughput is determined the next step is to determine how much
that throughput can be increased by various changes to the environment. Scalability
directly related to bottlenecks. A bottleneck is a part of the system which has
reached its maximum throughput. It is the part of the system which is
constraining the system from reaching better performance. Scalability options
are generally very effective when they are directly related to a bottleneck. Bottlenecks
generally fall into the categories of: processor, memory, disk, or network
speed.

In
order to determine effective scalability options you must first identify the
bottleneck responsible for limiting throughput and then work to eliminate that
bottleneck. For instance, if the bottleneck is the processor, the test would be
to evaluate the impact of adding another processor. While one can rarely expect
throughput to double for every doubling of resources, a high correlation on the
order of 0.8 is reasonable—if there isn’t another bottleneck being masked by
the first.

Bottlenecks
only become apparent when they are constraining the performance of the system. If
the processor is constraining the throughput of the system then it will be
exceedingly difficult to detect that the installed memory will only support
another 10 percent growth in transaction throughput. The net effect is that
finding one bottleneck doesn’t necessarily accomplish the goal of identifying
scalability options. Generally it is necessary to actually demonstrate the
performance of the proposed scalability options to demonstrate the actual
improvement.

Here
in lies the fundamental rub of performance testing. To do the testing correctly
you have to have the hardware that you’re trying to test scalability for—and at
that point you’ve already bought it.

Evaluating performance

In today’s
world most applications don’t demand performance that exceeds the capabilities
of a single system. For instance, Microsoft estimates that a dual processor 2.4
GHz Intel Pentium 4 running ISA Server is capable of encrypting between 30 and
42 megabits per second of SSL traffic. Given that most organizations have no
more than a standard T1 (D1, if you prefer) at 1.544 megabits per second
connecting their organization to the Internet, there’s a huge gap between the
capabilities of systems today with the average demand. While most organizations
focus on things like SSL encryption and other known difficult tasks for
computers to perform, few realize the gap that exists between the capabilities
of the systems and the load that they’re discussing.

The
discussion typically centers on a technique or set of techniques which are
known to consume more time. For instance, in .NET using reflections to
manipulate objects takes longer than making direct calls to the object. (Reflections
are a mechanism for accessing object metadata at run-time.) This is a statement
of fact. However, what it misses is that reflections, while taking several
times the amount of processor time as direct methods, still won’t scratch the
surface of the available processor time in most applications.

Similarly,
an exception handling approach of wrap and rethrow is dismissed because it is
very processor intensive. Exception handling is expensive in general because it
causes a stack walk to find the appropriate handler. The wrap and rethrow
technique can generate potentially a dozen or more exceptions for each single
bottom level exception that is directly raised by an error. It may be a few
hundred times more expensive from a processor perspective than a single
exception, however, this ignores the fact that exceptions do (or at least
should) happen very infrequently and the information captured with a wrap and
rethrow technique is substantially more useful when troubleshooting a problem.

Another
area where performance comes up, and where it is of limited value, is when
talking about stored procedures (in a Microsoft SQL Server database). Some
organizations require stored procedures for every table access but not for the
security benefits that it can provide; instead they are focused on the
performance improvement to be gained through the use of the stored procedure. However,
in most situations the use of stored procedures provides little or no
performance improvement. The parts of the process that are being skipped
(compilation and query optimization) are so efficient in SQL server that they
effectively take no time.

The
list of items like this, where there’s a performance focus at most mid-size and
large organizations, seems to be endless. While keeping conscious of
performance is important because you can become so wasteful with resources that
you do cause a performance problem, these situations are most frequently about
writing solid code more than they are about using one technique or another.

Hardware is cheap

While
it’s possible to do performance testing – and do it well – it’s generally an
expensive, difficult, and time consuming process that rarely returns value on
the investment. In most organizations it’s easier and cheaper to add more
memory, another processor, or another server to enhance performance.

Similarly,
focusing on minor items which impact overall performance (utilization) may
cause an inappropriate focus on design ideas that provide minimal performance
in exchange for a substantial amount of development time. It pains me to say
that hardware is cheap, utilize it.