This article is also available as a TechRepublic download, which includes the code listings in a more manageable text file format.

Any database programmer will tell you that in high-traffic
database-driven applications, a single badly-designed SQL query can significantly impact the overall performance
of your application. Not only does such a query consume more database time than
it ideally should, but it can have an exponential effect on the performance of
other application components.

Optimizing query performance is as much a black art as a
science, as heavily dependent on the developer’s intuition as on hard
statistical performance data. Fortunately, databases likes MySQL come with some tools to aid the process, and this
article discusses three of them briefly: using indexes, analyzing queries with EXPLAIN, and
adjusting MySQL’s internal configuration.

#1: Using indexes

MySQL allows you to index database tables, making it
possible to quickly seek to records without performing a full table scan first
and thus significantly speeding up query execution. You can have up to 16
indexes per table, and MySQL also supports multi-column indexes and full-text
search indexes.

Adding an index to a table is as simple as calling the CREATE INDEX command and
specifying the field(s) to index. Listing
shows you an example:

Listing A

mysql> CREATE INDEX idx_username ON users(username);
Query OK, 1 row affected (0.15 sec)
Records: 1  Duplicates: 0  Warnings: 0

Here, indexing the username field of the users table ensures
that SELECT queries which reference this field in their WHERE or HAVING clause will
run a little faster than in the pre-indexed state. You can check that the index
was created (Listing B) with the SHOW INDEX command:

Listing B

mysql> SHOW INDEX FROM users;
| Table | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
| users |          1 | idx_username |            1 | username    | A         |      NULL |     NULL | NULL   | YES  | BTREE      |         |
1 row in set (0.00 sec)

It’s important to note that indexes are a double-edged
sword. Indexing every field of a table is usually unnecessary, and is quite
likely to slow things down significantly when inserting or updating data
because of the additional work MySQL has to do to rebuild the index each time.
On the other hand, avoiding indexes altogether isn’t such a great idea either,
because while this will speed up INSERTs, it will
cause SELECT operations to slow down. There is thus always a trade-off
to be made, and it’s wise to consider what the primary function of the table
will be (data retrieval or data edit) when designing the indexing system.

#2: Optimizing query performance

When analyzing query performance, it’s also useful to
consider the EXPLAIN keyword. This
keyword, when placed in front of a SELECT query,
describes how MySQL intends to execute the query and the number of rows it will
need to process to successfully deliver a result set. To illustrate, consider
the following simple example (Listing C):

Listing C

mysql> EXPLAIN SELECT, city.district FROM city, country WHERE city.countrycode = country.code AND country.code = ‘IND’;
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
|  1 | SIMPLE      | country | const | PRIMARY       | PRIMARY | 3       | const |    1 | Using index |
|  1 | SIMPLE      | city    | ALL   | NULL          | NULL    | NULL    | NULL | 4079 | Using where |
2 rows in set (0.00 sec)

Here, the query is structured as a join between two tables
and the EXPLAIN keyword
describes how MySQL will process the join. It should be clear the current
design will require MySQL to process only one record in the country table (which is indexed) but all 4079
records in the city table (which
isn’t). This then suggests scope for improvement using other optimization
tricks – for example, adding an index to the city table as follows (Listing D):

Listing D

mysql> CREATE INDEX idx_ccode ON city(countrycode);
Query OK, 4079 rows affected (0.15 sec)
Records: 4079  Duplicates: 0  Warnings: 0

And now, when you re-run the query with EXPLAIN, you’ll see a noticeable improvement (Listing E):

Listing E

mysql> EXPLAIN SELECT, city.district FROM city, country WHERE city.countrycode = country.code AND country.code = ‘IND’;
| id | select_type | table   | type  | possible_keys | key       | key_len | ref   | rows | Extra       |
|  1 | SIMPLE      | country | const | PRIMARY       | PRIMARY   | 3       | const |    1 | Using index |
|  1 | SIMPLE      | city    | ref   | idx_ccode     | idx_ccode | 3       | const |  333 | Using where |
2 rows in set (0.01 sec)

As this illustrates, MySQL now only needs to scan 333
records in the city table to
produce a result set — a reduction of almost 90 percent! Naturally, this
translates into faster query execution time and more efficient usage of
database resources.

#3: Adjusting internal variables

MySQL is so open that it’s fairly easy to further fine-tune
its default settings to obtain greater performance and stability. Some of the
key variables that should be optimized are listed below.

  • Altering
    Index Buffer Size (
    variable controls the size of the buffer used when handling table indices
    (both read and write operations). The MySQL manual recommends that this
    variable be increased “to as much as you can afford” to ensure
    you get the best performance on indexed tables, and recommends a value
    equivalent to about 25 percent of the total system memory. This is one of
    the more important MySQL configuration variables and if you’re interested
    in optimizing and improving performance, trying different values for the key_buffer_size variable
    is a good place to start.
  • Altering
    Table Buffer Size (
    a query requires a table to be scanned sequentially, MySQL allocates a
    memory buffer to this query. The read_buffer_size variable
    controls the size of this buffer. If you find that sequential scans are
    proceeding slowly, you can improve performance by increasing this value,
    and hence the size of the memory buffer.
  • Setting
    The Number Of Maximum Open Tables (
    variable controls the maximum number of tables MySQL can have open at any
    one time, and thus controls the server’s ability to respond to incoming
    requests. This variable is closely related to the max_connections
    variables — increasing this value allows MySQL to keep a larger number of
    tables open, just as increasing max_connections
    increases the number of allowed connections. Consider altering this value
    if you have a high-volume server which receives queries on multiple
    different databases and tables.
  • Deciding
    A Time Limit For Long Queries (
    comes with a so-called “slow query log”, which automatically
    logs all queries that do not end within a particular time limit. This log
    is useful to track inefficient or misbehaving queries, and to find targets
    for optimization algorithms. The long_query_time variable
    controls this maximum time limit, in seconds.

The previous discussion should give you some insight into
three tools you can use to analyze and optimize your SQL queries, and help you
squeeze better performance out of your application. Go on and try them out — and
happy optimizing!