Software

We need a better solution: How would you extract the top n records in Excel?

Excel's strange advanced filter behavior has me scratching my head. Have you got a better solution?
Excel's advanced filtering feature is a great tool for extracting just the data you want to see, but it needs a bit of coaxing. Knowing just the right formula is often the key to working with this feature. For instance, how would you return the top n records in a sheet? My solution is to use a formula criterion with the advanced filter. First, you need a criteria range. It can be as simple as two cells, but in this example, the criteria range A1:F4 already exists, so lets see what happens! (This data is from Northwind, the example database that comes with Access.) Now, let's suppose I want to extract the five highest priced items in column C, Unit Price. In that case, the criteria range is C1:C2.

z

In cell C2, enter the following formula:

=C6>=LARGE($C$6:$C$82,5)

This formula will return true or false. It compares the current cell in the list (column C) to the results of the LARGE() function. When applying this technique to your own workbook, be sure that the C6 reference is relative and the range argument is absolute. The C6 reference refers to the first cell in the column you're filtering and the $C$6:$C$82 reference identifies the entire range of values in the same column. Replace 5 in the LARGE() function to change the n factor. For instance, if you want to see the top 3 values, use 3; to see the top 10 values, use 10, and so on.

Now you're ready to run the filter to see the results, as follows:

  1. Select cell C6 (or any cell in C6:C82).
  2. Then, from the Data menu, choose Filter and then Advanced Filter. In Excel 2007 and 2010, click the Data tab. Then, click Advanced in the Sort & Filter group.
  3. In the Action section, specify whether you want to filter in place or copy to another location. For this example, we'll filter in place.
  4. The list range should be correct (if you selected a cell in the list before executing the feature).
  5. Enter $C$1:$C$2 for the criteria range.
  6. Click OK.

Oh… wait… it didn't work. Instead of extracting the five records you expected, it hid them all! Now, here's the trick that will make it work, but I don't know why. Delete the heading text in C1 and repeat steps 1 through 6. Now, it works. Frankly, this puzzles me as it violates the rule that the criteria and list header ranges must match!

The above fix is quick and easy, but if you don't know about it, you're just stuck. You could play around with different ranges and formulas all day and still not be successful! You simply can't present this type of solution to most users—and not end up fielding calls on it later and by then, you might not remember the trick either!

We need a better solution… A better solution would leave the criteria range headings intact and still work! Is this possible? I'd like to hear your techniques. Remember, you don't want to force users to remember that in this particular case… they have to do something special like delete header text! Working with the list and criteria ranges, already in place, how would you get the desired results? Or, if that's not possible, come up with a completely different solution--just remember that I'm looking for a solution that users can remember and implement.

The first one to come up with a reasonable working solution, given the guidelines above, will win something cool from the techrepublic.com stash of cool PR stuff, like a mug or a mouse pad! Impress your colleagues—See what TR sent me for being so smart????

About

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

22 comments
biaogewang
biaogewang

coz it's formula, it may involve more than 1 column and therefore, you either enter nothing in the column head or you enter anything that's not a heading in the filter list. The formula already includes the information on which column it relates to. Say, i have a category ID instead of product ID in column A in your table, i may use the formula "=SUMPRODUCT(($A$6:$A$82=A6)*($C$6:$C$82>C6))=0" in C2 to filter the highest price in each category, and you can see that this formula involves 2 columns, & that's why a column heading should not be included. The criteria range need not be of the same "width" or number of columns as the filter list. Criteria which use formula is seen as criteria that does not relate to any particular column in excel. If you use the same column headings in the criteria range as in the filter list, you may want to add any criteria that uses formula at the very end of the criteria range (add new columns to the criteria range).

RU7
RU7

PHOOEY! This is my first code post and I couldn't figure out how to format the code in the post. Add a sheet after the one with the data. Rename the new sheet "View High Priced Records". Then add the below code to the new worksheet. Sorting on column "C", the "Produce Name" column is just incase there are 2 or more items with the same high price. It might be better to delete the rows greater than t instead of hiding them. This way, every time you go to this new sheet you are asked how many rows you want to see, with 5 as the default. Cancelling exits without making any changes. Private Sub Worksheet_Activate() Dim sh As Worksheet Dim r As Range Dim t As Variant On Error GoTo HandleError t = InputBox( _ Prompt:="Enter the number of top records to view: ", _ Default:=5) If (Not (IsNumeric(t))) Then Exit Sub ActiveSheet.Rows.Clear Set sh = Worksheets("Sheet1") Set r = sh.UsedRange r.Copy (ActiveSheet.Range("A1")) ActiveSheet.UsedRange.Sort _ Key1:=ActiveSheet.Columns("C"), _ Order1:=xlDescending, _ Key2:=ActiveSheet.Columns("B"), _ Order2:=xlAscending, _ header:=xlYes For Each r In ActiveSheet.UsedRange.Rows If (r.Row() > t) Then r.Hidden = True Next r HandleError: Debug.Print Err.Number & "..." & Err.Description End Sub

starr88
starr88

Easy enough if you know how to deal with Excel arrays. Using the sample sheet shown, for the top 3 values, do the following: [1] select cells C2:C4 [2] in the formula bar, type =LARGE(Products!C6:C82,{1;2;3}) DON'T PRESS ENTER YET and be careful about the round braces around the 1;2;3 and the semicolons. [3] Now press Ctrl+Shift+Enter to enter the array formula. The formula will now have the surrounding braces. Trap: You cannot manually type the braces; you must enter the formula (without the braces) using the Ctrl+Shift+Enter. To get the top 5 values, select five empty cells, type the formula but change the "1;2;3" to "1;2;3;4;5", then press Ctrl+Shift+Enter. Trap: you cannot edit array formulas in a straightforward manner either. Learn more about array formulas using the help system.

Weeping
Weeping

Add a new column called 'Rank'. Using the RANK formula, assign a rank for each row, referring to the Unit Price column (see Excel help for more info.). Under the Rank criteria heading, type '

merakli
merakli

I would just sort by price (desc) and pick the first five lines.

Mike
Mike

It seems to me that we have overlooked the inbuilt functionality within Excel. If the sheet has filters, choose the filter from the ribbon, choose number filters and then select the number of records (top, bottom etc.) that you wish to see. If there are more entries with the same value than 10 (e.g. 25 items are the same price), you will see all 25. Why get into complicated formulas?

vladnik
vladnik

Create a pivot table. Select all the field that you want displayed and then the magic starts: you can either sort by the column value by which you wish to sort (if you don't want to commit to a top 10 or top 20), or create a custom filter that will display only the top/bottom x items (set this by clicking PivotTable on the PivotTable toolbar, and then click Sort and Top 10).

ddalgleish
ddalgleish

Use a formula as the heading in cell C1, so it includes a space character when a formula returns TRUE or FALSE in C2. For example: =IF(ISLOGICAL(C2),C5&" ",C5) It's okay to have a heading when using a formula in the criteria range -- it just can't be the same as the original heading.

TobiF
TobiF

Instead of getting a boolean value directly from your formula, you can evaluate a comparison. To select the 4 largest values in column B, you can enter something like this: =">="&LARGE(B5:B65536;4) I don't know if this is an intended feature of the filtering function (haven't played too much with it), but the filter will update only when I reapply it.

gkeramidas
gkeramidas

i do an autofilter, select top 10 and change the 10 to a 5.

TobiF
TobiF

If the criteria range has a title, which equals a title in the table, then entries in the table must match (or satisfy, in case of comparisons) whatever is entered in the criteria. And the formula's result is either TRUE or FALSE. So if the criteria header matches a table header, you'd be searching for either TRUE or FALSE values...

ssharkins
ssharkins

Good, simple, elegant... gets the job done! Sometimes, we do over think a problem, don't we! However, the filters are still good when you want to create a list, etc.

ssharkins
ssharkins

The problem persists, I just used a poor example, because there's an AutoFilter that does the same thing -- but the problem persists with the formulas and there won't always be an AutoFilter. My fault -- I'll try to be more thorough next time.

Mark W. Kaelin
Mark W. Kaelin

gkeramidas send a private message to Mark Kaelin - you've earned a piece of TechRepublic swag.

ssharkins
ssharkins

Yes, that is an excellent suggestion and it works beautifully. That is the solution I think I would go with too. The feature is flexible and will handle most top, bottom and even percentages! Why work harder than you need to and users will have no problem implementing it as needed, once shown how it works. I am curious about the advanced filter -- why it works only after deleting the header text. I've done a bit of research and can't find a thing. Can someone tell me if this is wad and explain it, or is it a bug?

fledis
fledis

Yes, but sometimes we can't sort. For example, if there is a chart drawn from data.

fledis
fledis

If the column heading in criteria range matches one in the data table, it tells Excel to look for exact value shown in criteria- in this case for text "FALSE". If we need Excel to evaluate cell by formula =a>b, criteria heading must be different from the data table. By the way, I usually use function RANK. That way I can extract not only Top-10 but also, for example, from 5th to 10th largest values and so on.

ddalgleish
ddalgleish

It's pretty well hidden in Excel help, but check the Help topic: Filter by using Advanced Criteria. It says, if you're using a formula "Do not use a column label for criteria labels; either keep the criteria labels blank or use a label that is not a column label in the range" There are examples of Excel Advanced Filter Criteria on my Contextures website.

ssharkins
ssharkins

I should know that -- you would think I'd know that... how odd is that though! Thank you for solving this puzzle for me. They need to fix that -- that's just bizarre... I'm glad I posted this -- that was just driving me nuts. I wish I'd used a better example though -- there are a number of formulas that don't have an AutoFilter fix, but I'm glad to have this mystery solved. Thank you!

TobiF
TobiF

If you make your formula evaluate to the comparison text you would have entered in a normal "advanced filter", then it works :) However, you need to reapply the filter every time. Just recalculating the worksheet after changes have been made is not enough...

Editor's Picks