This article is also available as a PDF download.

Arranging data in a manner that’s
meaningful can be a challenge. Sometimes all you need is a simple sort. Often,
you need more–you need groups you can analyze and summarize. Fortunately, SQL
offers a number of clauses and operators for sorting, grouping, and
summarizing. The following tips will help you discern when to sort, when to
group, and when and how to summarize. For detailed information on each clause
and operator, see Books Online.

#1: Bring order with a sort

More often than not, all your data
really needs is a little order. SQL’s ORDER BY clause organizes data in
alphabetic or numeric order. Consequently, similar values sort together in what
appear to be groups. However, the apparent groups are a result of the sort;
they aren’t true groups. ORDER BY displays each record whereas a group may
represent multiple records.

#2: Reduce similar values into a group

The biggest difference between
sorting and grouping is this: Sorted data displays all the records (within the
confines of any limiting criteria) and grouped data doesn’t. The GROUP BY
clause reduces similar values into one record. For instance, a GROUP BY clause
can return a unique list of ZIP codes from a source that repeats those values:

SELECT ZIP
FROM Customers
GROUP BY ZIP

Include only those columns that
define the group in both the GROUP BY and SELECT column lists. In other words,
the SELECT list must match the GROUP BY list, with one exception: The SELECT
list can include aggregate functions. (GROUP BY doesn’t allow aggregate
functions.)

Keep in mind that GROUP BY won’t sort the resulting groups.
To arrange groups alphabetically or numerically, add an ORDER BY clause (# 1).
In addition, you can’t refer to an aliased field in the GROUP BY clause. Group
columns must be in the underlying data, but they don’t have to appear in the
results.

#3: Limit data before it’s grouped

You can limit the data that GROUP
BY groups by adding a WHERE clause. For instance, the following statement
returns a unique list of ZIP codes for just Kentucky customers:

SELECT ZIP
FROM Customers
WHERE State = 'KY'
GROUP BY ZIP

It’s important to remember that
WHERE filters data before the GROUP BY clause evaluates it.

Like GROUP BY, WHERE doesn’t support aggregate functions.

#4: Return all groups

When you use WHERE to filter data,
the resulting groups display only those records you specify. Data that fits the
group’s definition but does not meet
the clause’s conditions won’t make it to a group. Include ALL when you want to
include all data, regardless of the WHERE condition. For instance, adding ALL
to the previous statement returns all of the ZIP groups, not just those in Kentucky:

SELECT ZIP
FROM Customers
WHERE State = 'KY'
GROUP BY ALL ZIP

As is, the two clauses are in
conflict, and you probably wouldn’t use ALL in this way. ALL comes in handy
when you use an aggregate to evaluate a column. For example, the following
statement counts the number of customers in each Kentucky ZIP, while also
displaying other ZIP values:

SELECT ZIP, Count(ZIP) AS KYCustomersByZIP
FROM Customers
WHERE State = 'KY'
GROUP BY ALL ZIP

The resulting groups comprise all
ZIP values in the underlying data. However, the aggregate column
(KYCustomersByZIP) would display 0 for any group other than a Kentucky ZIP.

Remote queries don’t support GROUP BY ALL.

#5: Limit data after it’s grouped

The WHERE clause (# 3) evaluates
data before the GROUP BY clause does. When you want to limit data after it’s
grouped, use HAVING. Often, the result will be the same whether you use WHERE
or HAVING, but it’s important to remember that the clauses are not interchangeable. Here’s a good
guideline to follow when you’re in doubt: Use WHERE to filter records; use
HAVING to filter groups.

Usually, you’ll use HAVING to evaluate a group using an
aggregate. For instance, the following statement returns a unique list of ZIP
codes, but the list might not include every ZIP code in the underlying data
source:

SELECT ZIP, Count(ZIP) AS CustomersByZIP
FROM Customers
GROUP BY ZIP
HAVING Count(ZIP) = 1

Only those groups with just one
customer make it to the results.

#6: Get a closer look at WHERE and HAVING

If you’re still confused about when
to use WHERE and when to use HAVING, apply the following guidelines:

  • WHERE comes before
    GROUP BY; SQL evaluates the WHERE clause before
    it groups records.
  • HAVING comes after
    GROUP BY; SQL evaluates HAVING after
    it groups records.

#7: Summarize grouped values with aggregates

Grouping data can help you analyze
your data, but sometimes you’ll need a bit more information than just the
groups themselves. You can add an aggregate function to summarize grouped data.
For instance, the following statement displays a subtotal for each order:

SELECT OrderID, Sum(Cost * Quantity) AS OrderTotal
FROM Orders
GROUP BY OrderID

As with any other group, the SELECT
and GROUP BY lists must match. Including an aggregate in the SELECT clause is
the only exception to this rule.

#8: Summarize the aggregate

You can further summarize data by
displaying a subtotal for each group. SQL’s ROLLUP operator displays an extra
record, a subtotal, for each group. That record is the result of evaluating all
the records within each group using an aggregate function. The following
statement totals the OrderTotal column for each group:

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal
FROM Orders
GROUP BY Customer, OrderNumber
WITH ROLLUP

The ROLLUP row for a group with two
OrderTotal values of 20 and 25 would display an OrderTotal of 45. The first
record in a ROLLUP result is unique because it evaluates all of the group
records. That value is a grand total for the entire recordset.

ROLLUP doesn’t support DISTINCT in aggregate functions or
the GROUP BY ALL clause.

#9: Summarize each column

The CUBE operator goes a step further than ROLLUP by
returning totals for each value in each group. The results are similar to
ROLLUP, but CUBE includes an additional record for each column in the group.
The following statement displays a subtotal for each group and an additional
total for each customer:

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal
FROM Orders
GROUP BY Customer, OrderNumber
WITH CUBE

CUBE gives the most comprehensive
summarization. It not only does the work of both the aggregate and ROLLUP, but
also evaluates the other columns that define the group. In other words, CUBE
summarizes every possible column combination.

CUBE doesn’t support GROUP BY ALL.

#10: Bring order to summaries

When the results of a CUBE are
confusing (and they usually are), add the GROUPING function as follows:

SELECT GROUPING(Customer), OrderNumber, Sum(Cost * Quantity) AS OrderTotal
FROM Orders
GROUP BY Customer, OrderNumber
WITH CUBE

The results include two additional
values for each row:

  • The value 1 indicates that the value to the left
    is a summary value–the result of the ROLLUP or CUBE operator.
  • The value 0 indicates that the value to the left
    is a detail record produced by the original GROUP BY clause.

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays