Data Management

Testing databases that contain a million rows with SQL

Benchmark testing can be a waste of time if you don't have a realistic data set. Arthur Fuller exhibits how you can set up your test to accommodate for more than one million rows, thus returning more accurate test results.

TechRepublic's free SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system. Automatically subscribe today!

When you perform benchmark tests, you likely follow this course of action: design a database containing no rows in any table (because all queries are lightning fast), populate the database with a few rows of test data, and then test the database. However, how accurate will your test results be if you know that the production database will contain millions of rows in the heavily-trafficked tables?

In order to get an idea of what performance will be like when millions of rows are involved, you need to manufacture a million or so rows, beginning with an existing table that has 1,000 rows. You'll need to give all of the rows a primary key. Also, you'll do each job separately, as shown in the following code:

USE northwind
SELECT COUNT(*) FROM [Source Table]

SELECT *
    INTO TempTable
    FROM [Source Table]
SELECT COUNT(*) FROM [TempTable]
GO

DECLARE @i Integer
SET @i = 1
WHILE @i < 10
    BEGIN
        INSERT INTO TempTable
            SELECT * FROM TempTable
            SET @i = @i + 1
    END

SELECT COUNT(*) FROM TempTable

SELECT Identity( int, 1, 1 ) AS PK, *
    INTO BigTable
    FROM TempTable
GO

SELECT COUNT(*) FROM BigTable
GO

DROP TempTable

CREATE CLUSTERED INDEX
    BigTable_Index
ON
    BigTable (PK)

This code manufactures a million rows from the source table nine times; then, it adds an Identity primary key to the table using the SELECT Identity construct.

Now you can perform your benchmark tests with a realistic data set. If you need 10 million rows for a realistic data set, just modify the WHILE @i line to set a new upper limit.

Note: Although it's possible to write a tool that will manufacture data, it's beyond the scope of this tip. However, there are tools available that manufacture data. Sybase PowerDesigner contains a pretty smart one that lets you populate text files with sample data for each table and then manufactures various combinations according to the relational tree.

Editor's Picks