When a table grows to millions or even
billions of rows, performance can be a serious problem. The last
thing you want to do is a table scan. Unfortunately, you can’t do
some queries any other way.
There are, however, several modifications that
you can make. A RAID configuration that uses striping will help
because more heads are available to move across more surfaces.
Also, a multi-processor setup can sometimes be useful since
different queries or different parts of the same query can run on
different processors. (Note: In SQL 2000, this requires the use of
SQL 2000 Advanced Server.)
If these hardware tweaks are still insufficient
for your requirements, you might consider horizontal partitioning.
This approach requires multiple hard disks at the very least, and
ideally multiple servers.
Imagine a table containing 100 million rows.
For simplicity’s sake, you could divide this table into 10 tables,
each containing 10 million rows, and named Table_01 through
Table_10. Each table will contain a range of rows, the range being
defined by primary key value. This works best if the original
table’s primary key is an identity key, but it will also work if
the primary key values are at least logically divisible. In other
words, you need to know in which table to look for any given key
value.
The next step is to define the tables, adding a
check constraint to enforce that any new row’s key lies within the
range. For example:
— On Server01:
CREATE TABLE Table_01
(ID INTEGER PRIMARY KEY
CHECK (CustomerID BETWEEN 1 AND
10000000),
… — more column definitions)
— On Server02:
CREATE TABLE Table_02
(ID INTEGER PRIMARY KEY
CHECK (CustomerID BETWEEN 10000001 AND
20000000),
… — more column definitions)
— and so on
Move the data from the original table into the
partitions. Now you should create a linked server definition on
each of the servers, specifying the connection information enabling
each server to connect to the others.
Then, create a distributed partition view on
each server. (You can actually create one view and copy it to the
other servers.) This view UNIONs the partitions into a single
logical table. For clarity, it’s best to fully qualify the names of
the table partitions in the view:
CREATE VIEW Table_00 AS
SELECT * FROM
Server01.MyDatabase.TableOwner.Table_01
UNION ALL
SELECT * FROM
Server02.MyDatabase.TableOwner.Table_02
UNION ALL
SELECT * FROM
Server03.MyDatabase.TableOwner.Table_03
— and so on
Now the database application can refer to
Table_00 without regard to the physical location of its partitions.
You can index each of the partitions the same way as the original
and, in these columns, the data will usually not correspond to the
key ranges (e.g., SalespersonID has nothing to do with the primary
key range). But even in the worst case scenario (such as a table
scan), each processor will search one tenth of the rows, and all of
the processors will search at once. The result will be
substantially quicker than a scan of the original table.
We’ve only considered SELECT queries, but
chances are that you’ll also want to insert and update rows. To do
so, you need an updatable partitioned view. A partitioned view is
updatable if it can meet the following two conditions:
- The view is built from a series of SELECT
statements, each of which references one table partition (linked or
local), all UNIONed together to form the logical equivalent of the
original table. - Triggers and cascading updates and deletes
may not be defined on any of the table partitions.
In addition, you should be aware of these four
rules:
- The view can reference each table partition
only once. - Computed columns cannot be indexed.
- The primary key definitions must be
identical. - All columns must be referenced in the SELECT
list. The simplest way to do that is SELECT *.
You may not have to partition tables
horizontally very often, but when the need arises, you’ll need
hardware resources to make it work. If at all possible, keep the
linked servers on the same hub or at least the same subnet.
TechRepublic’s free SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system. Automatically subscribe today!