Getting the most from performance analysis of distributed computer systems

Tasked with setting up a performance test for an n-tier distributed system? Before you get started, check out these tips on creating a test environment.

This is the first article in a two-part series that will discuss procedures for analyzing the performance of an n-tier distributed computer system. In part one, we examine components of a test environment and show you how to create a test matrix. The follow-up article will detail test procedures, bottleneck analysis, and how to establish system benchmarks for performance comparison.

It is important to realize that software performance testing is a process-intensive set of tasks, and the more time you spend on proper setup, the better your chances of pocketing meaningful data that is essential for system design and architectural scaling.

Know your testing environment
In testing, it’s crucial that you mimic your production system. For example, if you’re trying to collect performance data on a machine where developers are running large compiles several times an hour, chances are the data you collect will be irrelevant. The same can be said if your test machines are connected to a network where someone is hosting a public MP3 collection.

However, exact replication of your production environment, including all hardware and software, could be cost-prohibitive. Within the constraints of your budget, re-create the “live” environment as much as possible.

A well-engineered test environment will make it easier to spot particular issues, such as poorly configured applications and unacceptable throughput. More important, a test system that mimics a production environment provides a foundation for quality test data. It is this data that will be used for performance benchmarking and capacity planning. Part two of this series will deal with these topics in detail.

Here’s what you will need
To set up your testing environment, you’ll need the following components:
  1. A network switch: It is very important to isolate the test environment from the “noise” you typically find on a development network. If you don’t, you will encounter application latencies that would not usually exist in the production environment.
  2. Test client machines: This is dependent on the load level you intend to apply to your system, but try to get at least two machines: one to generate load and another to take timing measurements. Sometimes the latencies measured by a client machine generating load with concurrent processes will include latencies from the overhead of managing those concurrent processes. The second machine monitors latency measurements from a single thread while the first machine is generating load concurrently, allowing for a more accurate view of system latency. Of course, the number of machines ultimately needed to generate enough load to determine your distributed system’s peak throughput will depend on both the complexity of the load profile and the capabilities of the system being tested.
  3. Servers: Since architectural decisions will be made from the collected data, use a server architecture similar to the production environment. For example, using a group of Solaris x86 machines to model the performance of a production system running on Solaris SPARC servers will not provide valid results.
  4. Time synchronization service: If you are measuring latency between two machines, your measurements will be accurate only to the time difference between clocks on both machines. To keep the clocks synchronized as closely as possible, some type of time synchronization service is required. The Network Time Protocol (NTP) offers UNIX systems a time synchronization service to submillisecond accuracy, which is quite suitable for most analyses. The Simple Network Time Protocol (SNTP), the protocol used in Windows Time Service found on most Windows networks, is less accurate, sometimes off by a half a second, depending on installation. Check with your network administrator for the options available to you.
  5. Testing software: There are numerous vendors of load testing software, such as Mercury Interactive, Rational, and Empirix, as well as some open source options. If you have the time and programming prowess, you could write your own solution. Regardless of the tool, each product has a particular feature set and learning curve. The best advice is to understand your requirements before making a decision. For example, almost all testing suites can generate HTTP load, but few are capable of generating SQL statements to load-test a database server directly. Server monitoring capabilities are also quite important. Don’t be fooled into believing Simple Network Management Protocol (SNMP) support will be enough to capture your system’s load levels unless you’ve already invested in a comprehensive SNMP agent suite.

Make a test matrix
When you delve into your distributed application, you need a systematic method of organizing the large number of parameters affecting performance. Protocol options, transaction sizes, parameters passed to the application, even application configuration settings can all affect performance. How do you keep track of it all?

Perhaps the most important step in performance analysis is building a test matrix to capture all of the dimensions that have an impact on your system’s performance. Begin by creating two columns, one for dimensions and another for values. Then, list all of the dimensions that affect performance for the whole system, each function in the system, each subfunction, and so on. Ultimately, this matrix will serve as a top-level overview of the system and allow you to develop smaller matrices for test plans and system documentation.

 Figure A shows a simplified test matrix for a document translation Web service that converts a Word document to XML and vice versa.
Figure A
Global dimensions Values
Document size 1K-1Meg
Conversion type DOC->XML, XML->DOC
Connection protocol HTTP, HTTPS
Authentication UserPass, Certificate
DOC->XML dimensions
Number of paragraphs 1-1000
Number of images 1-10
XML->DOC dimensions
Number of external references 1-100
CDATA size 1K-500K
Test matrix for a document translation Web service

In Figure A, eight dimensions are listed. Each dimension possesses a large range of values. The next step is to expand this matrix by determining which of these dimensions should be tested against each other. In our example, authentication and conversion type are not related, but message size and conversion type are. (See Figure B.)
Figure B
d 1K 10K 100K 1Meg
DOC->XML a a a a
XML->DOC a a a a
Message size and conversion type

Obviously, complex systems will require more intricate test matrices. These matrices expand to several pages with 20 to 30 columns, demanding considerable test time. Use the matrix to outline potential issues to developers as well as to senior management.

Proper planning can make the difference between meaningful test results and a collection of useless data. In this article, we’ve taken a look at the components of a high-quality test environment and shown you how to implement a test matrix. The next installment discusses creating a thorough test plan, deciding what to include in the testing process, and establishing performance benchmarks.

Editor's Picks