Due to popular demand, we're back with the second segment in our series of articles covering the basics of the SQL standard. Thanks to everyone for your feedback! In the last article, we covered some database terminology and looked at the four basic data query types. We also explained the WHERE clause and conditional statements and provided examples of the various query types.
In this article, we’ll pick up where we left off with some additional functions and clauses you can use with the basic SELECT data query.
SELECT options refine results
As we saw in the last article, the SQL SELECT statement has the widest variety of query options, which are used to control the way data is returned. These options come in the form of clauses, keywords, and functions.
A clause is a statement that modifies results. Clauses are not required statements, but refine what and how data is displayed. The WHERE clause in a query is one example.
A keyword triggers functionality within the database. These are sometimes required with queries, such as INTO and VALUES in the statement “INSERT INTO table_name (column1) VALUES (‘data1’);”. We’ll look at the DISTINCT keyword, which triggers some useful optional functionality.
Functions are built-in statements that apply logic to a result set. We’ll cover several options of this type.
I’ve summarized the most common of these clauses, keywords, and functions below. I’ll explain each one and give examples of usage in the sections that follow.
- ORDER BY—A clause that returns the result set in a sorted order based on specified columns
- DISTINCT—A keyword that returns only unique rows within a result set
- COUNT—A function that returns a numeric value which equals the number of rows matching your query
- AVG—A function that returns the numeric value that equals the average of the numbers in a specified column
- SUM—A function that adds the numbers in a specified column
- MIN—A function that returns the lowest non-null value in a column
- MAX—A function that returns the largest value in a column
- GROUP BY—A clause that summarizes the results of a query function by column
Use ORDER BY to sort results
The ORDER BY clause allows your database to sort your results so that you don’t have to do it “manually” in your application code. The ORDER BY clause must come at the end of a query statement.
Basic usage is as follows:
SELECT * FROM Contacts ORDER BY first_name;
You are free to use ORDER BY with any select statement that might return multiple rows. You can also use it in conjunction with other clauses:
SELECT first_name, last_name FROM Contacts WHERE first_name BETWEEN ‘a’ AND ‘k’ ORDER BY last_name;
You can give multiple columns to sort by. Precedence is given from left to right, so the order in which you list your columns is important.
SELECT * FROM Contacts ORDER BY company, last_name, first_name;
Results are displayed in ascending order by default, either numerically or alphabetically. You can change this behavior by including the DESC keyword following any column name in the ORDER BY clause. In the example below, the highest net_amount will be listed first (in descending order). If two or more rows contain the same net_amount value, they will be displayed showing first the entry with the last_name value that comes earliest in the alphabet, because the last_name column is still sorted in ascending order.
SELECT * FROM Sales ORDER BY net_amount DESC, last_name, first_name;
After running out of defined column names to sort by, most databases will then sort by the first column in the table and work toward the right. Implementation in this area varies though, so if the sort order is important, you should explicitly define which columns to use.
Another handy thing to note is that with the ORDER BY clause (as with the WHERE clause), the columns you are using to sort the results do not have to be part of the returned result set. The following example is perfectly valid, as long as all referenced columns exist in the table:
SELECT company, first_name, net_amount FROM Sales ORDER BY start_date, last_name;
DISTINCT returns unique results
The DISTINCT keyword returns only unique rows within a result set. You may need to find out, for example, which companies are represented in your Sales table, but you don’t want to see every entry. You can use DISTINCT to return one row for each unique company name:
SELECT DISTINCT company FROM Sales;
When you use DISTINCT, it applies to all requested columns. If you want a list of all the salespeople in your table and the companies they represent but not every sales entry, you can use the following statement. Note that this may return several entries from the same company, etc. DISTINCT applies to the entire requested result set.
SELECT DISTINCT company, last_name, first_name FROM Sales;
You can still use DISTINCT when narrowing and sorting your results, as with any SELECT statement. To determine what is displayed, the database first establishes whether the refined request matches a row, then applies the DISTINCT function. The ORDER BY clause is always processed after the entire result set has been determined. In the following example, only rows in which the net_amount is greater than 100 will be returned. Since DISTINCT keeps the first encountered row that matches the query result set and discards the rest, the net_amount column referenced in the ORDER BY statement may seemingly yield randomly ordered results.
SELECT DISTINCT company, last_name, first_name FROM Sales WHERE net_amount > 100 ORDER BY company, net_amount;
Functions apply logic
Functions that return a single value are referred to as aggregate functions. When accessing the results of one of the following aggregate functions from your application, the “field name” containing the results will be the actual function you used. For example, when parsing your database results, the key in your results array may look like one of the following:
$keyname = “COUNT(*)”;
$resultkey = “AVG(net_amount)”;
The COUNT function tells you how many rows are in a result set. As with all functions, it accepts one parameter. This basic example will tell you how many rows are in your table:
SELECT COUNT(*) FROM Sales;
You can also use it to count the number of rows in any result set.
SELECT COUNT(*) FROM Sales WHERE net_amount > 100;
If you want to see how many rows contain non-null values for a particular column, you can use COUNT on that column. Note that this will return the total number of rows unless the database is configured to default to NULL when a field is empty for the column in question. Also, listing more than one column name will cause an error.
SELECT COUNT(company) FROM Sales;
COUNT can also be applied to count the number of rows in a DISTINCT result set.
SELECT COUNT(DISTINCT company, last_name) FROM Sales;
The COUNT statement is frequently used to determine the number of iterations to use for a FOR loop in your code.
AVG returns the average of all the fields in a column with a numeric data type. It accepts one column name as its parameter, and it will return “0” if it's used on a non-numeric column.
SELECT AVG(net_amount) FROM Sales;
You can limit the scope of this function with clauses.
SELECT AVG(net_amount) FROM Sales WHERE company LIKE ‘%ABCD Co%’;
As with all aggregate functions, the ORDER BY statement will be ignored.
SUM works just like AVG, except it returns the sum of values in all fields in the result set.
SELECT SUM(net_amount) FROM Sales WHERE net_amount > 100;
The AVG, SUM, MIN, and MAX functions will return an error if you do not specify a column, so you can't use the ‘*’ wildcard.
MIN returns the lowest, non-null value in the specified column. If the column is a numeric data type, the result will be the lowest number. If it's a string data type, it will return the value that comes first alphabetically.
SELECT MIN(net_amount) FROM Sales WHERE last_name = “Smith”;
SELECT MIN(last_name) FROM Sales;
MAX works just like MIN, only it returns the highest non-null value. It too can be used on strings or numbers.
SELECT MAX(net_amount) FROM Sales;
SELECT MAX(company) FROM Sales WHERE net_amount > 100;
The MAX function is sometimes used on columns containing an auto-incremented key field to determine what the next entry’s key ID will be. Unless you’re running a nonpublic database, be wary of using this information to insert the next entry, in case another user beats you to the punch.
GROUP BY makes functions more useful
While all these functions provide helpful information, the GROUP BY clause lets you apply those functions to a subset of the fields in a column. Rather than running a query containing the MAX function once for every company in your Sales table, you can run one query with the GROUP BY clause to get the same results:
SELECT company, MAX(net_amount) FROM Sales GROUP BY company;
This gives you the highest net_amount for each company. You can also use this statement while selecting multiple column names and use multiple columns to group the function results.
The following examples illustrate the various ways this will produce results. First, including the GROUP BY clause lets you specify additional columns to display. However, be aware that this example will return the first last_name value encountered in the group; the SUM(net_amount) displayed will be for the entire company and not just rows matching the last name. This is because we're using only the company field to define our group.
SELECT company, last_name, SUM(net_amount) FROM Sales GROUP BY company;
In the above example, the last_name column provides pretty useless information, but you're allowed to request it in your query because of the functionality in the next example. You can create groups defined by multiple columns. This will produce function results for unique rows in the result set created by the combination of all specified GROUP BY columns:
SELECT company, AVG(net_amount), last_name FROM Sales GROUP BY company, last_name;
The above example provides the average net_amount for each distinct last name in each company. The order in which you list GROUP BY columns controls the sort order of your results, but the actual function value results will be the same.
Another example shows how you can group results without displaying the grouped columns. This is useful, for example, if you want to show the number of sales per person but keep the names anonymous:
SELECT company, COUNT(sale_id) FROM Sales GROUP BY company, last_name;
Limiting queries that use GROUP BY
As you saw above, you can limit the scope of queries that use these concepts with the WHERE clause. The WHERE clause will be evaluated first, then the function applied. This still holds true when using groups.
SELECT company, AVG(net_amount), FROM Sales WHERE net_amount > 100 GROUP BY company;
The above example will apply the AVG function only to rows that qualified under the WHERE restriction. Note that the WHERE clause must precede the GROUP BY clause. You can also limit the result set returned after the groups are evaluated with the HAVING statement.
SELECT company, AVG(net_amount), FROM Sales WHERE last_name BETWEEN ‘a’ AND ‘m’ GROUP BY company HAVING AVG(net_amount) > 500;
The above statement calculates the average net_amount for each company, evaluating only the values for people whose last names meet the criteria, and displaying only the results that are greater than 500.
A SELECT group of programmers with a DISTINCT advantage
If you’re still with us, congratulate yourself—you survived the second part of our series on SQL basics! You've learned many of the basic concepts used with SELECT statements when querying a single table, including sorting results and retrieving useful information with the DISTINCT keyword and aggregate functions. As with the previous segment, if you found this helpful or you have questions, send us an e-mail and I’ll follow up with future articles.