One of the primary jobs of a database administrator is to predict when the system will need additional machine resources and ensure that it gets those resources before the database experiences performance problems. To meet this duty, the administrator must have the information that shows how the database server consumes resources. In this Daily Feature, I’ll show you how to gather the data you need to form a cohesive picture of server performance and use that information to keep your database from bogging down.
Establish a baseline
Sadly, Oracle database servers vary widely in resource usage. They can be stressed one minute and idle the next. For example, it’s common for an Oracle server to be at 100 percent utilization from 8:00 A.M. to 5:00 P.M. and then revert to 90 percent idle for the remaining hours of the day. These swings in usage generate misleading “average utilization” metrics.
For Oracle optimization to be effective, you must gear it to those times when the database is performing at peak levels. Even if the server is idle all night, you’ll still need to have enough CPU, RAM, and disk bandwidth to keep response time fast during the day, when all of your users are hammering it.
For a simple example, consider a sample plot of vmstat “user CPU” plus “system CPU” from a large Solaris Oracle database server, as shown in Figure A. While the average CPU usage is only about 40 percent, there are many spikes where the CPU is 100 percent busy.
The manager must decide to upgrade (by adding CPUs) to accommodate the 100 percent CPU spikes or wait until the stress becomes more frequent. This is an example of the sort of statistics you can gather about the database server. Now I’ll examine how to use this kind of information to optimize your server. The optimization rules for Oracle in a server environment are actually quite simple. They deal with disk I/O, RAM, and CPU consumption on the server. I’ll examine each in succession.
Oracle handbooks
Please note that Oracle database server tuning is extremely complex, and entire books have been devoted to the subject. The best are the Oracle9i for Windows Handbook (2002, Oracle Press) and Oracle9i UNIX Administration Handbook (2002, Oracle Press).
Configuring the disk subsystem for Oracle databases
All Oracle databases heavily access your server’s hard drives. Tracking, managing, and controlling disk activity is critical for a database administrator who wants to make the most efficient use of the server. To see how disk tracking works, let’s take a look at some sample reports that are used for disk hardware capacity planning.
Oracle provides a utility called STATSPACK for tracking disk I/O. You can easily extend STATSPACK to keep track of the amount of space used within the whole database. You can use this information to plot the overall database growth rate and predict the future database size. This type of report is especially useful for ensuring that there’s enough disk to accommodate the future growth of the database system. You can see a sample of such a report in Listing A. To generate such a report, you can use code similar to that shown in Listing B.
But there’s more to disk monitoring than measuring the total disk consumption. You must also measure total disk I/O and track the number of read and write operations over long periods of time. It’s interesting to note that common signatures often begin to appear when you capture and plot disk I/O information for Oracle databases. For example, in Figure B you can see regular spikes in CPU consumption that follow a clear, repeating pattern. You can use the code in Listing C to produce an Oracle STATSPACK report on disk I/O.
As you may know, RAM caches are used to minimize disk I/O. On the disk storage, mass storage disk devices (such as EMC disk arrays) contain several gigabytes of RAM cache, and the Oracle database includes an area called db_cache_size (formerly db_block_buffers in Oracle8i), which contains RAM buffers for incoming disk blocks.
When optimizing the disk I/O subsystem on the database server, it’s also important to take a look at the overall disk configuration. This includes the RAID configuration for the disks and the mapping between the disks and the physical data files.
The most popular disk configuration for Oracle databases is RAID 0+1, which is also RAID 10. RAID 0+1 combines disk mirroring with block-level striping. The disk mirroring is insurance against a failed disk, while the block-level striping ensures that no individual disk becomes overburdened with I/O.
With block-level striping, each data block in a file is distributed across the stripe set. For example, a file named customer.dbf might physically exist on eight different disk spindles. Hence, all the I/O will be randomly distributed across all the Oracle database files, eliminating the possibility that any one spindle will experience contention.
Optimizing server RAM for Oracle
The Oracle database needs enough RAM to allocate the System Global Area (SGA) and also enough reserve RAM available for the Program Global Area (PGA) for any session that will be connecting to the database server. In an optimal environment, you’ll have enough RAM on the database server to handle the maximum number of connected users, plus the RAM consumed by the database SGA region. A database that is short on system memory resources will experience RAM paging, a condition in which memory frames are written to the swap disk in order to make enough memory available for competing tasks.
In Oracle9i, you can use the pga_aggregate_target parameter, which provides shared RAM resources for PGAs. However, you’ll still need a simple way to measure RAM consumption by individual Oracle sessions. A huge amount of RAM consumption takes place when a session sorts a result set from Oracle, so monitoring sorts can give you an idea of the amount of RAM used on the database server. Listing D contains a STATSPACK script for extracting sorting information. Figure C shows a plotted graph of the results. In the figure, you can see clear, repeating patterns of high RAM use, much like a cardiogram of a heartbeat.
Oracle CPU optimization
Because of the huge changes in an Oracle database’s CPU demands, determining the optimal amount of CPU resources for Oracle is a challenge. Some databases can experience hundreds or thousands of transactions per second, so the load on an Oracle database can vary significantly at different points in time.
Oracle users who want to be sure that they always have uniform response times will commonly look at the server run-queue metric that is captured within the UNIX vmstat utility. The run queue is the number of tasks waiting for execution, including those tasks that are currently being serviced.
In other words, if your UNIX server with eight CPUs has a run queue of nine, all eight CPUs will be busy processing tasks. A ninth task will wait for an opportunity to be serviced by the CPU. Whenever the run queue exceeds the number of CPUs on the database server, the Oracle database can experience a CPU bottleneck. You should make additional CPU resources available to that server in order to eliminate the bottleneck.