Data Management

Database benchmark wars: What you need to know

How do you separate the reality from the hype in database performance benchmarks? Database guru Don Burleson spots some tricks vendors use to tweak their benchmark results.

With a flood of advertising from leading database platform providers like Microsoft, Oracle, Sybase, and Informix, development managers may find themselves fishing for reliable benchmarking figures for relational database management systems (RDBMS).

Every database vendor cites benchmark data that “proves” its database is the fastest, and each vendor chooses the benchmark test that presents its product in the most favorable light. So the Transaction Processing Performance Council (TPC) was created in an effort to provide uniform benchmark tests.

The TPC also recognized that different types of applications have different processing signatures. They developed TPC-C benchmark for OLTP benchmarks, TPC-R and TPC-H (formerly TPC-DS) benchmarks for data warehouses and decision support systems, and the TPC-W benchmark for Web-based systems. The TPC-C benchmark test is considered the de facto standard for online transaction processing (OLTP) database benchmarks, primarily because TPC-C attempts to simulate real-world OLTP database transactions.

Despite their best efforts
However, the TPC was unable to prevent database vendors from fudging benchmark results. The biggest issue is the different hardware platforms databases run on. For example, IBM’s UDB database (formerly DB2) only runs on the IBM 3090 mainframe, while the Informix database only runs on UNIX. So it’s difficult to compare benchmarks for these products because of the vastly different machine architectures.

How database vendors fudge benchmark results
In their efforts to “prove” that their database product is superior to the competition, database vendors employ numerous tricks to improve the processing speed of their benchmarks. Virtually all of the tricks employed by database vendors involve caching data and SQL in RAM.

Remember, database performance is all about disk I/O, and vendors use large RAM memory regions to preload critical benchmark components to avoid any disk access. Some of their tricks:
  • Buffering up data rows—By preloading the data into the RAM buffers, database can access the information thousands of times faster then a disk I/O access.
  • Storing SQL execution plans in RAM—By preparsing and precomputing the execution plans for the SQL, the database vendors bypass the overhead of parsing and invoking the SQL optimizer to generate the execution plan.
  • Prejoining tables—Some database products have special preaggregation mechanisms to prejoin tables. For example, Oracle has Materialized Views that can store the results of an n-way table join, allowing super-fast data access.
  • Using high-speed CPUs and clusters—Database vendors can dramatically improve benchmark speeds by using special high-speed machines and cluster architectures.

Using tricks like these, all database vendors find ways to make their product superior to the competition. Therefore, most database benchmark data should be viewed with healthy skepticism.

So how do we compare database speed?
For the manager charged with comparing database performance, benchmark studies present a huge challenge. While it's not fair to say that database benchmarks are meaningless, it's very difficult to use benchmarks to “prove” that any single database product is faster than another. To see how difficult it is to compare database performance, just look at the TPC publication on a variety of TPC-C benchmarks for various hardware and database platforms.

Choosing your database
Given this wealth of confusing and conflicting information, how do you choose the proper database product? Of course, performance is only one of many factors in evaluating a database. You must also consider the availability of trained DBAs, the vendor’s technical support, and total cost of ownership, among other factors.

Most databases can be configured to process hundreds of transactions per second. Based on that, this is what we can say about database performance:
  • Hardware matters—Database performance is largely a function of hardware. A poorly performing database can be made to appear fast with cached disk arrays and super-fast processors. Even “independent” benchmarks can be misleading because of the varied hardware, disk, and network configurations.
  • Knowledge is speed—Delivery of high-speed transactions requires knowledge of the application. High-speed database performance is usually achieved through complicated caching tricks and preaggregation schemes. This requires detailed knowledge of the application and its I/O signatures.

In sum, it's difficult to evaluate database performance objectively. The savvy manager must carefully evaluate all of the database vendors with a jaundiced eye to separate the hype from the reality.

Editor's Picks