As the sophistication of your Microsoft SQL Server database implementation increases, the need to optimize performance will also increase. Understanding how the SQL Server optimizer works is the first step toward the establishment of a truly optimized database environment.
While certain queries may, by their very nature, require little or no cost estimation or comparison between plans to determine optimum performance, most queries will benefit from full optimization. One of the most beneficial methods for increasing the performance of queries is the creation of efficient indexes. A well-constructed index will allow a query to avoid the necessity of scanning the entire table for results.
When you create an index, SQL Server automatically measures and stores the corresponding statistical information regarding the distribution of values in the indexed column. This statistical information is used by the optimizer to determine the optimal strategy for evaluating a query.
There are two types of indexes: clustered and non-clustered, each with unique advantages depending on the data set.
A clustered index dictates the storage order of the data in a table. Because the data is sorted, clustered indexes are more efficient on columns of data that are most often searched for ranges of values. This index type also excels at finding a specific row when the indexed value is unique.
A non-clustered index is similar to an index in a textbook where the data is stored in one place and the data value in another. A query searches for the data value by first searching the non-clustered index to find the location of the data value in the table and then retrieves the data directly from that location. The non-clustered index is useful for queries resulting in exact matches.
Get a better handle on the intricacies of Microsoft SQL Server with these other TechRepublic articles: "Learn these SQL Server survival skills for non-DBAs"
"Increase your SQL power with user-defined functions"
"An introduction to SQL Server Profiler"
As a general rule, indexes should be kept as narrow as possible, most likely following a business use requirement, to reduce the amount of processing overhead associated with each query. Analyzing and optimizing query performance requires the measurement and collection of statistical data.
SQL Server maintains statistical performance data about index keys. If configured to do so, non-indexed keys can also be statistically measured.
There are several basic statistical concepts a database administrator should understand regarding performance optimization.
- Cardinality: measures how many unique values exist in the data set.
- Density: measures the uniqueness of values within a data set. Density is determined by dividing the number of rows that correspond to a given key by the number of rows in the table. Indexes with high densities will likely be ignored by the optimizer.
- Selectivity: measures the number of rows that will be returned by a particular query. Selectivity is determined by dividing the number of keys requested by the number of rows they access. To calculate the relative cost of a query plan, the optimizer needs a valid selectivity measure.
As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less than optimal decisions on how to process a query. Therefore, SQL Server automatically updates this statistical information periodically as the data in the tables change. The cost of this automatic statistical update is minimized by sampling the data, rather than analyzing all of it.
Designing and designating indexes in a complex database table can be a daunting task. Fortunately, SQL Server has a built-in Index Tuning Wizard to help you establish an optimal set of statistics and indexes. Running the wizard will provide a list of suggestions for improving the query performance of your database based on a scripted workload.
With a deeper understanding of how the SQL Server query optimizer works, you will have the knowledge you need to implement only the recommendations of the Wizard that apply to your particular situation. However, as with any dynamic system, your optimal database performance analysis will need to be updated periodically. Knowing what each statistical measurement actually means with regard to query index performance will give you a good basis for making your management decisions.
Mark Kaelin is a CBS Interactive Senior Editor for TechRepublic. He is the host for the Microsoft Windows and Office blog, the Google in the Enterprise blog, the Five Apps blog and the Big Data Analytics blog.