Data Management

Using Grouping Sets in SQL Server 2008

A neat new feature in SQL Server 2008 is the GROUPING SETS clause, which allows you to easily specify combinations of field groupings in your queries to see different levels of aggregated data. Today we'll look at how you can use the new SQL Server 2008 GROUPING SETS clause to aggregate your data.

 

GROUP BY

The GROUP BY clause is a SQL language construct used to priovide summary data for column returned in a SELECT statement.  This functionality groups values from specified fields together, providing a single record of distinct values for each group.  To illustrate the how GROUP BY works, lets look at an example.  Use the script below to create and load the SalesHistory table.

IF OBJECT_ID('SalesHistory')>0
DROP TABLE SalesHistory;
GO
CREATE TABLE [dbo].[SalesHistory]
(   [SaleID] [int] IDENTITY(1,1) NOT NULL  PRIMARY KEY,
    [Product] [char] (150) NULL,
    [SaleDate] [datetime] NULL,
    [SalePrice] [money] NULL
)
GO
DECLARE @i SMALLINT
SET @i = 1
WHILE (@i <=100)
BEGIN    INSERT INTO SalesHistory
    (Product, SaleDate, SalePrice)
    VALUES ('Computer', DATEADD(mm, @i, '3/11/1919'), DATEPART(ms, GETDATE()) + (@i + 57))
    INSERT INTO SalesHistory
    (Product, SaleDate, SalePrice)
    VALUES ('BigScreen', DATEADD(mm, @i, '3/11/1927'), DATEPART(ms, GETDATE()) + (@i + 13))
    INSERT INTO SalesHistory
    (Product, SaleDate, SalePrice)
    VALUES ('PoolTable', DATEADD(mm, @i, '3/11/1908'), DATEPART(ms, GETDATE()) + (@i + 29))
    SET @i = @i + 1
END

I can use the GROUP by clause to retrieve the average SalePrice per Product.

SELECT Product, AVG(SalePrice) AS AverageSalePrice
FROM SalesHistory
GROUP BY Product

Notice that the GROUP BY clause returns a distinct Product list, followed by the average of each SalePrice value.

The only drawback of the GROUP BY clause is that it only really allows you to specify aggregations for a single set of database columns.  The new SQL Server 2008 GROUPING SETS clause expands upon the GROUP BY functionality allowing you to specify different field combinations to return aggregate data.

GROUPING SETS

This functionality can enhance reporting requirements and data analysis by allowing you to retrieve aggregation data through one statement, rather than several distinct queries.  GROUPING SETS also allows for "Grand total" data for the entire set of data, or just for sections of aggregations.

The following query uses the GROUPING SETS clause on the SalesHistory table. This query returns 2 different sets of aggregations; one at the Product level, and a "Grand total" level, denoted by the () in the GROUPING SETS clause.

SELECT Product, SUM(SalePrice) AS TotalSales, COUNT(*) AS SaleCount
FROM SalesHistory
GROUP BY GROUPING SETS((Product),())
ORDER BY Product DESC, TotalSales DESC, SaleCount DESC

Notice the last row returned in the above screenshot. This is the "Grant total" record I mentioned above.

In the following query, I use three separate field sets in my GROUPING SETS clause. Doing so create three totally different sets of aggregations; one by Product, one by the SalesTier (calculated via a subquery), and one "Grand total" column. Note that the order in which the grouping sets appear in the GROUPING SETS clause is not important.

SELECT
Product, SalesTier, TotalSales = SUM(SalePrice), AverageSalePrice = AVG(SalePrice)
FROM (
SELECT
Product,
SalesTier =
	CASE 		WHEN SalePrice BETWEEN 0 AND 500 THEN 'Tier 1'
		WHEN SalePrice BETWEEN 501 AND 750 THEN 'Tier 2'
		WHEN SalePrice BETWEEN 751 AND 1000 THEN 'Tier 3'
		WHEN SalePrice > 1000 THEN 'Tier 4'
	END,
SalePrice
FROM SalesHistory
) a
GROUP BY
	GROUPING SETS((Product), (SalesTier), ())
ORDER BY Product DESC, SalesTier DESC

You can achieve the same results as the above query using typical TSQL constructs, but it is much more cumbersome to do so, as I show in the following script.

IF OBJECT_ID('tempdb..#SalesResults') IS NOT NULL
DROP TABLE #SalesResults
CREATE TABLE #SalesResults
(
	Product VARCHAR(10),
	SalesTier

VARCHAR(10),

Next Time
	TotalSales MONEY,
	AverageSalePrice MONEY
)
INSERT INTO #SalesResults(Product, SalesTier, TotalSales, AverageSalePrice)
SELECT
	Product, NULL, SUM(SalePrice), AVG(SalePrice)
FROM
	SalesHistory
GROUP BY Product
UNION ALL
SELECT
	NULL,
	SalesTier,
	SUM(SalePrice),
	AVG(SalePrice)
FROM
(
	SELECT
	SalesTier =
		CASE
			WHEN SalePrice BETWEEN 0 AND 500 THEN 'Tier 1'
			WHEN SalePrice BETWEEN 501 AND 750 THEN 'Tier 2'
			WHEN SalePrice BETWEEN 751 AND 1000 THEN 'Tier 3'
			WHEN SalePrice > 1000 THEN 'Tier 4'
		END,
	SalePrice
	FROM SalesHistory
) a
GROUP BY SalesTier
UNION ALL
SELECT
	NULL, NULL, SUM(SalePrice), AVG(SalePrice)
FROM SalesHistory
SELECT *
FROM #SalesResults

Next Time

Today I took a look at the usefulness of the new GROUPING SETS clause in SQL Server 2008. The results of the GROUPING SETS clause can be achived in other ways, but it takes more programming work to do so. Next time I will take a look at SQL Server 2008 enhancements to the CUBE and ROLLUP; two constructs that can produce similar results to the GROUPING SETS clause, but in a different manner.

About Tim Chapman

Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.

Editor's Picks

Free Newsletters, In your Inbox