By Mark Strawmyer
Indexing in a relational database creates a performance tradeoff that is often overlooked. The more indexes you have, the more likely you will retrieve data from the system quickly. However, it's equally true that the more indexes you have, the longer it will take to insert new data into the system. In this article, you’ll learn about the different kinds of indexes supported by Microsoft SQL Server, the different ways that you can implement indexes, and what you can do to get more read performance than you give up in performance.
Definition of an index
Indexes are database tools that increase the system's ability to retrieve data by not scanning all of the data records in search of the desired record(s). Indexes can change the organization of data so that it is structured similarly to how it will be accessed for retrieval. The indexes are created on column(s) to help the database find data based upon the value(s) contained in those indexed column(s).
Types of indexes
There are two indexes supported by Microsoft SQL Server: clustered and nonclustered. Clustered indexes dictate the physical order of data stored in a table. Since a table can only have one physical order, there can only be one clustered index per table. Clustered indexes are efficient when searching for ranges of data since the data is already physically ordered.
Nonclustered indexes do not impact the underlying physical storage, but rather are made up of pointers to data rows. If a clustered index exists, the pointers in nonclustered indexes contain a reference to the location in the clustered index. These indexes are more compact than the data and can be scanned faster than the actual data table.
How to implement indexes
Some indexes are automatically created by the database. For example, Microsoft SQL Server automatically creates unique indexes to enforce UNIQUE constraints, which ensures no duplicate data can be inserted. Other indexes can be created using the CREATE INDEX statement or through the SQL Server Enterprise Manager, which includes an index creation wizard to guide you.
Getting more performance
While indexes can provide a performance benefit, they also come at a cost. While SQL Server allows you to create up to 256 nonclustered indexes per table, it is not advisable. Indexes require additional storage space both in memory and on the physical disk drive. They also lead to a decrease in performance when performing insert statements because the data needs to be inserted according to the indexes instead of the first available space, which ensures an insert or update statement will take longer, the more indexes that exist.
Use the following list as a guideline to assist you when creating indexes in Microsoft SQL Server:
- Choose the right data types: There are certain data types that are more efficient when used in indexes than others. Int, bigint, smallint, and tinyint are all very good data types to index because they have a specific size and are easy to do comparison operations on. Other types such as char and varchar are much less effective because it is not easy to perform mathematical operations on them and comparison operations take longer.
- Ensure the indexes are actually used: When performing queries involving columns that are part of a cluster, it is important to pay attention to how the data is used. When functions are applied to the data columns, it invalidates the advantages of the sort. For example, when a date value is indexed and the date value is converted to a string for comparison purposes, the indexed date value is not used in the query.
- Pay attention to the order of columns when creating multicolumn indexes: The indexes are sorted based on the first column and then further sorted according to the order of each additional column in the index. The columns with the less unique data should be listed first in the index to ensure that the data is further sorted as it moves across the index.
- Limit the number of columns in clustered indexes: The more columns involved in the clustered index the more data will have to be stored in the nonclustered indexes that contain references to the clustered index. This increases the size of the tables that contain the indexes and thus increases the amount of time to search based on the index.
- Avoid clustered indexes on frequently updated columns: Since nonclustered indexes depend upon clustered indexes, if the columns comprising the clustered index are frequently updated, this will cause the row locators stored in the nonclustered indexes to also have to be updated. This leads to increased performance costs for all queries associated with these columns as locking occurs.
- Split operations (if possible): When inserts and updates need to be performed frequently on a table, as well as reads, try to separate the tables if possible. All of the inserts and updates can be performed on a table with no indexes, and then later replicated to the other table where there are a heavy number of indexes in place to optimize data reads.
- Rebuild the indexes properly: Nonclustered indexes contain pointers to the clustered indexes and thus have a dependency on clustered indexes. When clustered indexes are rebuilt, it can be done by first dropping the index and then using CREATE INDEX to recreate the index, or by including the DROP_EXISTING clause as a part of the CREATE INDEX statement. Performing the drop and create as separate steps will cause the nonclustered indexes to be rebuilt multiple times rather than just once when the DROP_EXISTING clause is used.
- Use fill factorwisely: Data is stored in contiguous pages that have a set size. When new rows are added to a data page that is full, the system must perform a page split which moves half of the data to a new page. This adds to system overhead and leads to fragmented data. The fill factor allows you to maintain gaps in the data when the index is built. This reduces the number of page splits that occur as data is inserted. The space is maintained only when the index is created and not as data is added or updated. Thus, indexes must be periodically rebuilt in order to continue to utilize the fill factor. The gaps left by the fill factor can lead to slower read performance as more disk accesses are required to read data since it is more spread out; so, it is important to consider whether the number of reads outweigh the number of write operations to determine whether a fill factor other than the default is appropriate for use.
Better query performance in Microsoft SQL Server can be accomplished through the efficient use of indexes, but the effective use of indexes is dependent on several implementation decisions. Making the right database management decisions regarding the performance tradeoffs of indexing can mean the difference between better performance and bogging down. The guidelines presented in this article will help you make the right decisions for your particular situation.