Data Centers

Benchmarking basics

Companies often cite statistics for their products that are benchmarketing rather than benchmarking. By following the guidelines presented by James McPherson in this Daily Drill Down, you'll be on the way to divesting a product of its hype.


Comparing products has never been easy. There’s not much difference between the spiels of 19th century snake oil salesmen and the product slicks handed out by marketers. Both make dubious claims that were difficult to confirm or deny—claims intended to make you purchase their products.

Lesson 1. Benchmarking vs. benchmarketing: Lies, damn lies, and statistics
The goal of true, honest benchmarking is to compare only the component or software in question, creating a relevant apples-to-apples comparison. By contrast, benchmarketing attempts to show the product in an unfair light by creating the illusion of an apples-to-apples comparison, all the while stacking the deck in the company’s favor. Here is an example of benchmarketing:

Apple-to-apple comparison?
  Washington apple1 Hawaiian apple2
Average weight 4 oz. 20 oz.
Survivable drop 24 in. 60 in.
Appearance Bright outer layer but fruit is pale Dull outer layer with festive inner fruit
Disadvantages Turns unappealing brown shortly after exposure to air. Thick protective outer layer can be difficult to open.
1 Granny Smith variant 2 Pine variant    

I hope you noticed we compared a Granny Smith apple with a pineapple. Welcome to the land of fine print and weasel words. Unfortunately, benchmarketing is usually much more insidious than our example and relies on the subtler aspects of system performance.

Benchmark theory
A valid benchmark requires a balanced playing field and an understanding of what it is you really want to test. Generating interesting numbers and colorful graphs is easy; determining a meaningful test takes a bit more work. While we could break benchmarking into nearly infinite categories, I’ll focus on the more common situations of testing a single component (a CPU), a home/office PC system, and an application (a database).

Step one is to not only identify the target of the test but to define the goals. The most fundamental goal is comparison with similar products. Comparing a $90,000 Ferrari and a $24,000 Mustang for speed isn’t a fair test. By the same token, the Ferrari and the Mustang would both be bested in towing capacity when compared against a diesel truck. The rather esoteric naming sequences of computer components makes it far more difficult for readers to determine what is or is not a comparable product, which makes this naming game a favorite trick of benchmarketers.

Product goals are typically price/performance, specialized performance, flexibility, or reliability. The tests should determine if the product meets its goals, and, in a perfect world, if the goals were even worthwhile.

Theory: Component testing
Testing a new component requires tests that only affect the specific target without being affected by any other subsystems and a second set of tests to see how well the component integrates into the whole system. When comparing multiple variants of the same component at the same time (such as several processors), identical components should be used whenever possible. Some particularly scrupulous testers with sufficient time will actually reuse components to ensure that one test system isn’t equipped with a lemon. When it’s not possible to use identical parts (such as the case of motherboards for incompatible processors), the components should be as similar as possible.

In the case of the CPU, you must run math-intensive applications designed to test the different aspects of the processor separately and as a whole. These programs must emulate real-world operations without impacting the rest of the computer. In particular, you must make sure the test applications fit into the available system memory so the hard drive’s load time and disk caching performance doesn’t get in the way.

You test the CPU’s ability to work with other components by running several common processor-intensive applications. The most common ones are business application suites, compiling software from source code, and games. Games are very susceptible to differences in video cards, while business applications tend to be more vulnerable to hard drive performance. All applications are affected by memory performance.

Theory: System testing
System testing generally involves testing the common components (CPU, hard drive, video) separately and then running applications typical for that type of system. The general system types are home/office, games/multimedia, workstation, and server.

Gaming/multimedia computers have high-speed video cards and CPUs, significant amounts of memory, and multichannel audio cards. Cost is less of an issue than image quality and high frame rates, with game performance and/or video quality being the primary test criteria.

Workstations are typically used by programmers and CAD/CAM drafters needing large amounts of memory and high-performance processors. Video quality is paramount, but frame rates do not need to be spectacular. Business software is often tested on workstations to provide a common reference to other business machines, but 3-D modeling or software compilation tests are the real focus.

Servers are a catchall category for systems that provide services to remote users. As a result, they typically have rudimentary video components with high-speed disk and networking systems. Performance tests are nearly custom events on these machines as the configuration of the operating system and applications has such an incredible impact. Fortunately for the corporate benchmarker, many companies will provide “loaner” servers for a week or two to allow testing in your particular environment.

The home/office system targets flexibility and business applications with a focus on price/performance. Testing these types of computers is a bit more difficult because this system will probably be used under a number of situations. All hardware components are evaluated, as are game performance and business software performance.

Theory: Application testing
Application testing differs from component and system testing. For example, in addition to testing the program’s performance with operations applicable for the target market, you also have to choose a hardware platform that is appropriate. Business software, like spreadsheets, should be tested on a home/office system with files of varying complexity. Games and video-editing programs obviously run best on multimedia systems, likewise CAD/CAM and programming suites find their best homes on workstations. Databases and other remotely accessed services, like Web and e-mail, fit servers. As with testing server hardware, server-oriented software is highly dependent upon the server’s overall configuration.

A database application needs to be tested under conditions appropriate for your needs. Criteria include maximum storable data, maximum number of concurrent connections, search speed, and data integrity. For testing a library database that’s storing large amounts of rarely modified entries, you’ll be more concerned with search times, but for a customer service database that has a number of agents modifying user data frequently, you’ll likely want to test its capacity for a large number of connections and how it’s able to prioritize changes to prevent data corruption. In this case, speed may be sacrificed for stability—on par with the old adage of the tortoise and the hare. Hardware will have to be carefully evaluated, as weak networking could hamper data transfers and reduce the effective number of concurrent connections. Search performance also increases when there is sufficient RAM for the indexes to be stored in memory.

Benchmarking gotchas
  1. Beware of test systems bearing multiple installations.
    The best testers prefer to run all their tests on the same systems to ensure the tests are performed fairly. Great. Wonderful. Have a cookie. Just don’t be lazy and run tests on concurrently loaded operating systems. There is an interesting phenomenon where hard drive data transfers are faster for files located at the edge of a disk rather than at the center. This is because the drive spins at a constant speed so the inner rings are smaller than the outer rings, and therefore less data can be read per revolution.

    Smart readers have realized that if you have multiple operating systems or applications installed on a machine, the last thing installed gets a free performance bonus. It is most apparent on applications with large amounts of data that will take up lots of drive space. Operating systems are also penalized because of the swap file location, impacting all aspects of system performance.

    The answer is to always perform a clean installation before each package is tested. Drives should be formatted and partitions should be rearranged as needed. Care should be taken to ensure that all configurations are tweaked equally well (or poorly, as the case may be) during the multiple installations. Disk imaging software can help speed the process and eliminate human error.
  2. Know thy enemy. (aka The devil is in the details.)
    You must be prepared to do research. If you do not understand the product, you are only going through the motions and will not truly be able to test the product’s limits. Applications have needs. Hardware seeks to fulfill a particular need. It is a wonderful thing when both needs match up and an expensive waste when they don’t. Find the priorities of your applications and how they handle data. Do they rely more on RAM speed or drive speed? Will they be bottlenecked by your processor or network card? Do you know why the hardware is listed as a workstation instead of a server?

    Performance varies dramatically by the exact situation. A server with a RAID drive array can be an incredible Web server, especially when large graphics or files are involved. A server with a much slower drive array but with significantly more memory can be an even better Web server when the files and graphics are small enough to fit in RAM.

    The type of hardware can also make a difference. For example, the Rambus memory used on current Intel workstation and server motherboards is better at handling large sequential chunks of data. Lower cost SDRAM is quite competent when the data is small or very fragmented, such as when multiple applications are running. Rambus does provide an advantage for many CAD/CAM workstations and database servers but does little for software-compiling workstations or Web servers. Processor speed is vital for a remote-application server but much less so on a Web server.

    As a result, you should not be afraid to make tweaks to the hardware. Changing from Rambus to inexpensive SDRAM could provide enough of a discount to install RAID hard drives or a faster processor.

    As an IT professional, your goal is to get the best performance for your dollar. Take advantage of loaner or test hardware to find the best solution for your needs.
  3. Don't lose sight of your goals.
    There is one additional gotcha: becoming a testaholic. It is incredibly easy to fall into the habit of testing every aspect of every product you encounter in all possible configurations. The notion of “thoroughness” comes to mind, but taken to the extreme. Remember that your time is also a factor. Is it really worth your spending another eight hours running tests just to save $150 worth of memory on a server? Maybe it is if you plan on buying a few dozen machines but probably not if this is a one-shot machine.

Conclusion
Benchmarking is a useful tool that every IT professional should have in his or her bag of tricks. Even if never used, the knowledge will let you cut through marketing hype, which can help avoid costly decisions. After all, isn’t that why you read these articles?
The authors and editors have taken care in preparation of the content contained herein but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for any damages. Always have a verified backup before making any changes.

Editor's Picks