Data Management

Running totals in SQL Server queries

Arthur Fuller demonstrates how running totals are simple to create in SQL Server queries once you understand the requirements.

You're tasked with creating a SQL Server query containing running totals based on the moment of the event. The classic example is a bank account.

For any given account, you sum the debits (deposits) and credit (withdrawals) at a given point in time. After each transaction, you want to know the current balance. Listing A creates a simple example of such a table.

Here are sample rows:

1     2006-11-03 02:33:42.340     10000.00
2     2006-11-03 02:34:50.467 -500.00
3     2006-11-03 02:35:04.857 250.00
4     2006-11-03 02:42:19.763 -124.25

Since the date is defaulted, all you need to do is add a few amounts. The example keeps it simple, assuming only one bank account.

Get SQL tips in your inbox
TechRepublic's SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system.
Automatically sign up today!

Now you can create the query that contains the current balance. Since you are recording deposits and withdrawals in the same column as negatives and positives, the sum is straightforward. To derive the current balance, you sum all the previous transactions and add this sum to the value of the current transaction.

The following query accomplishes this:

SELECT 
transactionid,
transactiondatetime,
amount,
(SELECT SUM(amount)
FROM dbo.bankaccount as D1
WHERE D1.transactiondatetime <= D0.transactiondatetime) AS balance
FROM dbo.bankaccount AS D0

This results in the following result set:

1     2006-11-03 02:33:42.340     10000.00    10000.00
2     2006-11-03 02:34:50.467 -500.00     9500.00
3     2006-11-03 02:35:04.857 250.00      9750.00
4     2006-11-03 02:42:19.763 -124.25     9625.75

As this example demonstrates, running totals are simple to create once you understand the requirements. The example presented assumes that the table contains only one account, but it's easy to deal with many accounts—you would just need to add a column for BankAccountNumber and a WHERE predicate that specifies the account of interest.

You can also turn this example upside down and create a running difference (as in an inventory count). You begin with an inventory of 1,000, and then subtract various purchases and receipts.

There are two advantages to such a query:

  • You don't have to store the results. When scoped by an account number or similar foreign key, performance can be lightning fast.
  • You end up with a transaction log that can be inspected row-by-row. If a bug turns up, you will be able to isolate the particular transaction that caused it.

Miss a tip?

Check out the SQL Server archive, and catch up on the most recent editions of Arthur Fuller's column.

18 comments
zaim_raza
zaim_raza

I have requirment to calculate the running total on the basis of group by like following: PeriodID City Amount RunningTotal 1 a 5 5 1 a 10 15 1 a 2 17 2 b 2 2 2 b 5 7 simply on changing the group running total should be reset as showing above.

johndoering
johndoering

When using the running total in Ms Access, I have run into issues with large tables. The running sum is recalculated for each line of data. I have sub-second query response for under 1,000 records. The time for the sum calculation rises sharply after 5,000 records to as much as an hour for 30,000.

DBAdmin
DBAdmin

A correlated sub-query is demonstrably the worst for performance, but some "experts" call it "lightning fast" or a "Best Practice". Check out this thread - http://database.ittoolbox.com/groups/technical-functional/sql-server-l/to-get-a-summarized-cumulative-total-in-a-sql-statement-833822 The article refers to "a transaction log that can be inspected row by row" being a result of this query. Huh?!? There is no transaction log for a SELECT statement, and Microsoft provides no tool to inspect the transaction log. Why is a tip author so badly misinformed?

tequilla
tequilla

The query has to calculate the sum for every return row.

Tony Hopkinson
Tony Hopkinson

for not having a balance column isn't it. I totally disagree with the idea you don't have to store it, if you don't that means you have to calculate it every time, balance is quite a useful number when reporting on balances! A lot of extra effort to save 8 bytes a record. Not to mention the audit problem, every time you run it the balance always adds up. Recording it for each transaction will highlight problems, this would hide them.

DBAdmin
DBAdmin

This is another example of this author posting SQL code with terrible performance characteristics. A correlated sub-query, as shown in this tip, offers the poorest performance among various options for doing running totals.

Mannion
Mannion

DBAdmin, With respect to your second and third paragraphs, I believe Mr. Fuller is using the term 'transaction log' in this article generically, rather than in the SQL Server-specific meaning of the phrase. In this case, the 'transactions' are deposits and withdrawals from a bank account, and the 'log' is the chronological record of account activity returned by the query. I may be misinterpreting this, myself, but this seems to make the most sense in the current context. -Michael

Mannion
Mannion

All, It seems that we all agree that the solution proposed in the article produces the correct logical result, but is not scalable due to the correlated subquery in the SELECT clause. In light of this, it would be helpful to find a better solution. I ventured to do this using SQL Server. I have only been using this platform for 3 years, however, and quickly ran into the limits of my skills. As a long-time Oracle user, however, I was able to find a performant solution on that database. I am wondering whether anyone with more SQL Server prowess knows how to translate this into an equivalent SQL Server-based solution. To solve the problem on Oracle, I used the LAG() analytic function: SELECT transactionid ,transactiondatetime ,amount ,amount + LAG(amount) OVER(ORDER BY transactiondatetime) balance FROM bankaccount In this example, the LAG() function looks back to the value in the previous row. By adding the current 'amount' value to the previous row's 'balance' value, we have a true running total. On my test server, this solution took 0.12 seconds, whereas the original query (with the subquery) took 19.14 seconds. The key difference between the two solutions is that the original requires a recursive operation that geometrically slows the query as the size of the bankaccount table grows. The analytic (LAG) solution requires only one full pass through the table (or, in my case, through a covering index on all table columns). The execution plan returned by AUTOTRACE in SQL*Plus bears this out. So my question is: Q: Does SQL Server have anything like the Oracle LAG() function that we can use to produce an equivalently-scalable solution?

kbmosher
kbmosher

I tested this query with an existing check register with 13,000 records which, believe me, is peanuts for any business. I watched it run for awhile, then came back to this article, clicked on comments, found this thread, and wrote this post - and I still have an hourglass on the query.

gsquared
gsquared

Yes, it's impractical to run on huge tables. In those cases, you either need to run on a subset. E.g.: Select dateadd(day, -1 * datepart(day, getdate()), getdate()) as Date, 'Starting Balance' as Transaction, null as amount, sum(amount) as Balance from Transactions where Date < dateadd(day, -1 * datepart(day, getdate()), getdate()) union all select date, transaction, amount, (select sum(amount) from transactions t2 where date between dateadd(day, -1 * datepart(day, getdate())and t1.date) + (select sum(amount) from transactions where date < dateadd(day, -1 * datepart(day, getdate())) from transactions t1 where date > dateadd(day, -1 * datepart(day, getdate()) order by date That will give you one month's data at a time. Of course, adding in some parameters and running it as a proc/function, so you can select the month you want to look at, will help. On a very large table, you might need to precalculate the beginning-of-month values, for performance purposes, and store those in a separate table.

hog43
hog43

What would be some other options to do this? Or are you simply saying that this should not be done when there is a large recordset?

artful
artful

You are quite right as to my intended use of the term, and I apologize to all for confusing my use with the more precise SQL use of the term. Thanks, Arthur

DBAdmin
DBAdmin

Possibly, maybe even probably. If so, then the whole statement is nonsense because every "running total" solution would do the same thing. That's the very essence of a "running total" query.

artful
artful

An ingenious solution to the problem. MS SQL has no equivalent to the LAG() function, but SQL 2005 does have a RowID() function that I might be able to exploit to do essentially the same thing as LAG(). I'll play around and let this thread know if I succeed (or fail). Thanks, Arthur

gsquared
gsquared

I ran it on a table on a relatively busy server on 1,000 records, took less than 1 second. Ran it on 10,000 records, took a couple of minutes. Tried to run it on 1,000,000 records, gave up after 20 minutes and cancelled the query. So, yeah, useful on small record sets (personal finance), on in cases where you're looking at a small subset of a table (as per my prior response). It's a useful technique in the right circumstances, but you have to understand the limits.

chapman.tim
chapman.tim

It really depends on what he means by impractical...if it is not ran in an OLTP evn. and the table is highly indexed, then it is not necessarily impractical. Another way of doing this is to use temp tables or/and cursors. A correlated sub-query (which is what this article is about) is really just an iterative query anyway...but they are very, very handy.

hsp
hsp

Arthur, Did you manage to find a solution? Best regards Henrik Staun Poulsen www.jagt.udstyr.dk

artful
artful

You all are quite right. I should have pointed out that I was thinking in terms of OLTP and more exactly in terms of the most recent month's records rather than the entire history (which the OLTP DB would be better positioned to handle). Even that proviso is insufficient, given a large DB such as 10,000 rows added per day. Thanks to you all for jumping on my toes and keeping me sharp. I'll be more careful with provisos in future tips. Arthur

Editor's Picks