Software

Two ways to return a unique list in Excel

Filtering data to produce a unique list of entries is a common task in Excel. Here are two methods of getting the job done: manually and with a little bit of VBA automation.

Lists work their way into just about everything we do. Ordinarily, you might not think of worksheet data as a list, but that data can quickly turn into one, depending on your needs. That need becomes a bit more complicated if you want to reduce the data to a unique list. Fortunately, Excel's filter feature can create a quick list of existing data and return only unique items. You can create the list manually or automate the task using VBA.

Note: This article is also available as a PDF download.

Manually, the easy way

With just a few clicks, Excel's Data feature can create a unique list from a list of values. The short list of order information in Figure A is from the Access sample file Northwind.mdb -- specifically, it's from the Order Details table. Notice that column H contains a unique list of order numbers from column A.

Figure A

To create the unique list, follow this simple three-step process:

  1. Select the first cell in the source list. In this case, that's cell A1.
  2. Choose Filter from the Data menu. Then, select Advanced Filter from the submenu. Excel 2007 users will find the Filter options on the Data tab in the Sort And Filter group.
  3. In the Advanced Filter dialog box, click the Copy To Another Location option. Fill in the Copy To range. A single cell, such as H1 is adequate; Excel interprets a single cell as the top cell in the resulting list. Check the Unique Records Only option, as shown in Figure B, and then click OK.

Figure B

If you need a quick list and it doesn't matter where the list is, use the built-in feature. However, if you want to use the list in some way, the quick way might not be adequate. Suppose you want to display the unique list in a combo box, as shown in Figure C. To do so, you could set the combo box control's Row Source property to the appropriate range. In this case, that's H2:H7, as shown in Figure D.

Figure C

Figure D

If you know that the list will never change, this solution works. Execute it one time and move on. However, this approach isn't practical if the source data changes.

VBA, the automated way

A source list that changes creates a special problem. Creating a new list of unique items isn't hard, but you don't want to update the combo control's Row Source property every time the original list changes. In this case, it's best to automate the entire process. To do so, we'll use VBA's AdvancedFilter method.

First, create two ranges: the list's source data and a target range for the unique list. Select the source data; in this case, that's cells A2:A18. From the Insert menu, choose Name and then select Define. Excel will automatically assume the content of A1, the string OrderID, as the name. Click OK. Next, select cell H1 and name it UniqueList using the same process.

Next, you need a user form and a combo box. Press Alt + F11 to launch the Visual Basic Editor (VBE). From the Insert menu, choose UserForm. If necessary, click the Toolbox button to launch the Toolbox and drag a combo box control to the user form. Name the combo box control cboUniqueList. Save the user form as UserForm1.

Now, you're ready to enter the code that automates the list. Choose Module from the Insert menu. Then, enter the function in Listing A. This function creates a unique list from the data in the OrderID named range, populates cboUniqueList with that list, displays UserForm1, and then deletes the unique list from the sheet.

Listing A

Function UniqueList()
  'Populate control with
  'unique list.
   Range("OrderID").AdvancedFilter Action:=xlFilterCopy, _
   CopyToRange:=Range("UniqueList"), Unique:=True
  'Set combo control's Row Source property.
  Range("UniqueList").Activate
  UserForm1.cboUniqueList.RowSource = Selection.CurrentRegion.Address
  'Display user form.
  UserForm1.Show
  Selection.CurrentRegion.Clear
End Function

The AdvancedFilter method automates the feature discussed in the previous section. This method uses the following syntax:

range.AdvancedFilter(action, criteriarange,copytorange, unique)
where range is a range object. Table A lists the method's other arguments. After creating the list, the code sets the combo box control's Row Source property to the unique list in column H and then opens the user form with a populated combo control (Figure C).

Table A

Argument Explanation
action Required constant: xlFilterCopy or xlFilterInPlace. Both are self-explanatory. If you create the list in place, Excel doesn't delete items, it simply hides them.
criteriarange Optional variant that defines criteria used to filter the list.
copytorange Optional variant that specifies where VBA copies the list.
unique Optional variant that determines if the list contains only unique values. The default value is False.

A few noteworthy points

Because the code relies on the CurrentRegion property, be sure to locate the unique list in an out-of-the-way place where there's no chance that the resulting list will run into existing data. Avoid giving the unique list's range name more than one cell unless you know the exact size of the resulting list. The AdvancedFilter method's copytorange needs only one cell. If the source list grows, you'll need to update the named range's dimensions to include new items before running the function. A shrinking list doesn't present a problem. The code doesn't sort the list, but you could add that capability to the code. The sample code contains no error handling, so be sure to test it thoroughly within your application and accommodate potential errors.

About

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

54 comments
RTHJr
RTHJr

I could use this; but I run into CSV files of various number of rows that could use VBA to convert into another CSV (for import into a database). Is there a command/function in VBA to support an expanding list?

syssjj
syssjj

Thanks for any help available - I want to see the dups before removing them! Susan

rg470
rg470

While you cannot output the unique records to a different sheet using the manual method I have found that you can using the VB method by naming the sheets as part of the AdvFilter process, for example Sheet1.Range("OrderID").AdvancedFilter Action:=xlFilterCopy, _ CopyToRange:=Sheet2.Range("UniqueList"), Unique:=True

martinshort
martinshort

Although I also reckon the fastest way is probably a Pivot table (why on earth wasn't this included in the original piece???), sometimes it can be useful to use COUNTIF to identify the current number of identical entries. Say your list is in column A, in B1 type "=COUNTIF($A$1:A1,A1)". Now autofill this down column B. B2 will be =COUNTIF($A$1:A2,A2) B3 =COUNTIF($A$1:A3,A3) and so on. This will place a 1 beside the first time an item makes an appearance, 2 for the second, 3 for the third and so on. If you filter column B by 1 and you will extract all the 1st instances of an entry. You can also filter by any other number if you want to identify the duplicates.

mirossmac2
mirossmac2

When I started selling off my library of 5,000+ volumes, in 123 categories, I foolishly gave about 400 of them non-unique IDs. So, eg, "The life of Queen Victoria" might be ROY123 in the Royalty category and HIS123 in the history category. I think I've stabilized all of them by the idiot method (sorting on title and looking for nonunique IDs) but could this system be tweaked to reveal any I might have missed?

mattohare
mattohare

Thanks for this post. Funny thing about the MS-Office products, there is so much to them. I've been working with them since the late 80s (for Word) and early 90s (for Excel and Access), but there are still things that are new to me.

williams
williams

You could always use a pivot table

staffordd
staffordd

"How About VBA on a variable List Source I could use this; but I run into CSV files of various number of rows that could use VBA to convert into another CSV (for import into a database). Is there a command/function in VBA to support an expanding list? " Hi! I so, so DON'T understand this question. Can you restate it, tell me step by step what you are trying to do? And when I hear csv, warning flags start waving in my brain...because the csv format isn't smart, because if a comma or a quote appears in the data, it can interpret it as a delimiter and skew the data. I avoid CSV at all costs, and champion the None/Tab delimited file, with which I have had pretty much zero problems. But if csv is what you get, there should be a way to deal with it - if I can understand what you are trying to do with the file? Thanks! Dave :-)

pgurney
pgurney

If you want to find the duplicates in one column, in Office 2007, you could use the Conditional Formatting option to highlight (color) the duplicates. To use this option, create a new rule and select the option to format unique or duplicate cells (and then choose the duplicated cells option).

staffordd
staffordd

Hi Susan, You have only one choice in my opinion. Use Microsoft Access instead of Excel. The Find Duplicates Query wizard works a treat, and you can view all dupes before removing, and also adjust the query to change the records displayed, i.e. if you wanted to exclude certain records from your delete. Trust me, this works far better than anything in Excel. "I Love Data Remove Dups, but--- How to Just Highlight Dups? Thanks for any help available - I want to see the dups before removing them! Susan "

mattohare
mattohare

Say you have the stuff to check in column A, and it starts in row 2. I insert a blank column in B. In B2, i put a formula like =IF((A2=A1),"Dup","-") Then I drag/fill it down the full lenght of the data set. If the item duplicates the one above, Dup appears. Otherwise a dash. (I used to make it an empty string, but I wanted to make sure it wasn't just some empty cell that sneaked in. The thing to realise is that the first item won't show as "Dup". You can tweak the formula to show that as well.

martinshort
martinshort

A variation on my post above will do this. Say Column A is your column where your data is. eg A1 = FRED, A2=BOB, A3=GEORGE, A4=BOB. Set the conditional formatting for cell A1 to be a formula - "=COUNTIF($A:$A,A1)>1" and make the cell background = Red if the condition is true. Now copy this formatting down the entire column, so the conditional formatting in A2 is "=COUNTIF($A:$A,A2)>1" and so on. In the example above, A2 and A4 will both be displayed with a red background since they are identical.

Answers
Answers

Excel 2007 has a Remove Duplicates command on the Data ribbon. Select the range (or copy of the range) and click Remove Duplicates and you will be left with a unique list. I Love Office 2007!

staffordd
staffordd

"When I started selling off my library of 5,000+ volumes, in 123 categories, I foolishly gave about 400 of them non-unique IDs. So, eg, "The life of Queen Victoria" might be ROY123 in the Royalty category and HIS123 in the history category. I think I've stabilized all of them by the idiot method (sorting on title and looking for nonunique IDs) but could this system be tweaked to reveal any I might have missed? " Probably, but unfortunately without seeing the data, it's just not clear enough what the problem is to solve it. Obviously, it helps if each item HAS a unique ID. It sounds to me like you have applied "catalogue numbers" which is fine, but probably it would have been maybe better that if along WITH the catalogue number, you had applied, across the board, a "behind the scenes" unique ID to each item, that RELATED to your catalogue number. But I am totally guessing, I am sure you could use Access to make this data better, but it might well involve actually fixing the data at the row level until each item IS truly unique. I wish you luck however you solve it... D.

samsrents
samsrents

To the right of your list, assuming title is in column A and the data is sorted, you can put the following set of formulae in two columns. In first column, =if(a2=a1,1,0) and in second column =if(a1=a2,1,0). After putting in the formulae in both columns, do a copy of the entire range of two columns and then use Paste Special / Values to overwrite the formulae with the 0/1 values. Then you can sort by the two 0/1 columns to get all the duplicated titles / flush the unique ones out. Now you've found all the non-uniques and can do your cleanup.

victor.gutzler
victor.gutzler

You could insert a column for marking duplicate ids in front of the column containing the ids. For example, put the heading "FlagDups" in cell A1 and "IDs" in cell B1 with the list of ids below it. Copy the following formula into cells A2 and all the way down to the cell of the last row of ids: =IF(ISNA(VLOOKUP(B2,$B$1:B1,1,FALSE)),"Unique","Duplicate") Autofilter the columns to select out the Duplicates or the Unique values. Each cell containing the formula basically looks up the id value next to it in the list up to the previous id value.

mille383
mille383

Loads of good info about BOTH applications to be had here.

bondjames870
bondjames870

Not to sure what you are trying to say..I mean is it or is it not. Anyhow I know I am rambling but try to see it from someone reading it the first time without thinking about it first. ---------- bondjames Luwow Goldman

lwilliamson
lwilliamson

Is there a way in Excel (I don't know Access), to add a frequency count in combination with the unique filter? Using the example in the original article, I'd like to create a summary list that shows entry 10248 with the number "3" in the cell to the right. I read the comment re: COUNTIF, but I don't want to have to manually transfer that info to the summary list (my datasets are much larger). Any guidance is appreciated. Thanks.

staffordd
staffordd

Hi Of course, pivot table would be an option, but, in my opinion (again) that's a function of Excel that a lot of Excel users just don't know about or use, and it's not the easiest to teach or learn. I think pivot tables are fantastic, but, since I worked in Access for 12 years and only got proficient in Excel during the last 6, for me, they are a slow and complicated way to do what can be done with almost lightning speed with one Access query. There is a LOT of crossover in what Excel and Access can do. I use them both a lot, but I tend to do some tasks in one and others in the other, trying to play on their individual strengths. Of course, the VBA underlying is very similar too, and if you are good with that, then you can manipulate either to do much, much more. But not all users have VBA knowledge, or pivot table knowledge - and maybe, if you think about it, they shouldn't HAVE to. The tool is meant to hide the VBA, the user just sees the GUI and does tasks, and the VBA is "invisible". For those users for whom VBA and pivot tables are a mystery, using the methods outlined in the article, or learning the simple aggregate query in Access, are probably the simplest best options. People shouldn't be afraid of Access. OK, sure, it's more difficult to learn than Excel (maybe!) but you could just learn little bits of it at a time, you don't need to become an expert to take advantage of the power of the query! Just a thought... Cheers, Dave :-)

staffordd
staffordd

Hello I am going to make what could be considered to be a controversial comment. This article on the surface, looks great, it's very helpful, and it teaches us some good ways to do some needful tasks in Excel. But therein lay the problem. My answer to the original statement, which is "Two ways to return a unique list in Excel" my immediate response is - "do it in Access!!!!!". The reason I say this, is that to me, it's a question of using the RIGHT tool for the RIGHT job. Now, if I wanted to do financial calculations, sum lists of numbers, make charts, pivot tables, etc. I would always use Excel. But when I work with DATA, as in querying data (and this task IS a query, or a "view" of data - the view being "make a unique list" - that's the requirement) I use...a DATAbase. So it's my contention that while you CAN create a unique list in Excel, and this article shows you two different ways to do it - you are, uh, forgive me, wasting your time, because with about four clicks in Access, you can create a query that will return the unique records in a list. It's SO much quicker and easier in Access, that I would not ever, ever, ever DREAM of doing it in Excel. Now, if I needed to generate a long sequence of numbers, or add up a series of numbers - Excel in a natural. I use it all the time - for the right tasks. Finding the unique items in a list however, for me, isn't one of those tasks. In Access - this is what you have to do 1) Create a new query, design view 2) Add the table (use "Add Table") containing the list you want to make unique 2) Drag down the field from that table you are querying into the QBE grid 3) Switch to aggregate query view (push the "Epsilon" button). This causes Access to "group" the records 4) Run the query You are done. (Note - you can also make this query do a lot of other things - sum, average, count, etc. by pulling the field down twice, and specifying "Group By" in the first grid, and "Count" (or whatever) in the second. The aggregate query is a VERY powerful and useful tool, and is VERY quick compared to Excel (again, in my opinion). I don't in any, any way mean to denigrate what is presented here, because if you ONLY work in Excel, then these methods are excellent and very useful. But if you have the option to use Access, there are certain tasks, MANY tasks, that can be done much more quickly and efficiently in Access than they can be in Excel, and in my opinion, creating a unique list is absolutely one of those! Conversely, there are certain tasks that are very slow, time consuming and difficult to do in Access, that can be done VERY quickly in Excel - so it goes both ways!!! There's an old saying about using the right tool for the right job - you wouldn't use a screwdriver to tighten a bolt! So when I saw this, my snap answer was - do it in Access !!! Apologies to the author, no offense intended, excellent and useful piece for Excel users! Thanks, Dave :-)

staffordd
staffordd

Hello, it's me again, Dave... "Conditional Formatting Has Option to Format Duplicates" "If you want to find the duplicates in one column, in Office 2007, you could use the Conditional Formatting option to highlight (color) the duplicates. To use this option, create a new rule and select the option to format unique or duplicate cells (and then choose the duplicated cells option)." I agree, conditional formatting is a powerful tool...except, to my mind, except for the lovely visual aspect of being able to say, color all duplicates RED, the problem is...you can't QUERY. So in Access, I would identify the duplicates (using a query wizard that's already written for you) and then I would be able to update them directly by changing the query to an update query. In Excel, how do I "call up" the RED rows, and update them or change a value? So while it has some value in SHOWING you the duplicates, the advantage of the Access query is, you can SEE them...and then you can update them, or delete them, or append them to a table or just about anything...whereas in Excel...you can LOOK at them. Mostly, I need to isolate duplicates or any subset of data because of a need for ACTION...such as update, delete, tag, mark, whatever. So conditional formatting is exactly what it says it is...formatting, NOT a stepping stone to actually DOING something with the data that is identified. I hope this makes sense. Carry on...and don't forget, Access and Excel have a lot of crossover, you CAN use Excel for database work and Access for spreadsheet type calculations. It just isn't easy! Moron this later... Dave :-) ;-)

martinshort
martinshort

I like Excel and am very comfortable with it. I like Access and am very comfortable with it. Sometimes I like doing database stuff in Excel because I can't be bothered to set up an Access DB to do some ad-hoc data cleansing which - in my humble opinion - I can do quicker in Excel. The key here is the words "ad-hoc". Something more structured or longer lasting, yes by all means go for Access or SQL. Multi-user - definitely Access. In the meantime, let's stop having a go at an excellent application which has been designed (in part) to run as a database; even mimic a relational database. Gripe over! :)

mattohare
mattohare

Find Duplicates query only returns the duplicates. It sounds like Susan may have wanted to see them in the context of the rest of the result set. (That's often the case with me.) Also, not everyone has the luxery of working for a firm that will buy the full MS-Office package.

david.hanshumaker
david.hanshumaker

I think in your second column you mean to test if a2=a3. I've often done this similarly using nesting functions. With the same conditions (list sorted by the column with duplicates, headers in row 1), one scratch column can find duplicates with "=if(or(a2=a1,a2=a3),2,1). If an entry is the same as the one above or below it, it is a duplicate and assigned a 2 in this scratch column. Otherwise unique and a 1. Only one column to sort afterward this way.

staffordd
staffordd

"Use VLOOKUP to flag duplicates You could insert a column for marking duplicate ids in front of the column containing the ids. For example, put the heading "FlagDups" in cell A1 and "IDs" in cell B1 with the list of ids below it. Copy the following formula into cells A2 and all the way down to the cell of the last row of ids: =IF(ISNA(VLOOKUP(B2,$B$1:B1,1,FALSE)),"Unique","Duplicate") Autofilter the columns to select out the Duplicates or the Unique values. Each cell containing the formula basically looks up the id value next to it in the list up to the previous id value." Hi Victor I appreciate that your method would probably work in Excel. But once again, I must gently persist... Read your description of how to find the duplicates in Excel. All the steps, using VLOOKUP, etc. Rad through, count the steps, consider how complex it is to actually DO. Then read this: Query - one button push New - one button push Find Duplicates Wizard - one button push Add table to be checked to the query - one buttonpush RUN QUERY - one buttonpush So - FIVE mouse clicks. NO typing. No inserting. No formulas. No HASSLE. Five clicks, and a table of DUPLICATES appears. The duplicates are displayed immediately! You are done. So when I read this, "use VLOOKUP...." inside my head a voice starts screaming "no no no no USE ACCESS!" Now, if you don't have Access, and you can't get it, then your method is great, I am sure it works, but I will absolutely bet that it's far more complex, clunky, slow, and less accurate than doing it with the Find Duplicates Wizard in Access. Just compare the complexity of the instructions, yours vs. mine. Excel was NOT designed to quickly identify duplicates! Yes, it CAN do that. I can also play a bass solo on the bottom strings of my guitar, BUT, I would be better served playing them on a real bass! Not only is it incredibly fast in Access, but if you want additional criteria, say to limit the records to a certain subset within your data, you can just take the duplicates query, set it back to design mode, type in your criteria, let's say ITEM COLOR must = "blue", re-run the query (one buttonpush) and voila, now you have duplicates WITHIN THE BLUE ITEMS ONLY. And you could add criteria to ten other fields if you wanted. It takes SECONDS. My usual disclaimer follows...not meaning to offend or upset anybody, trying to help, and I do admit, I get frustrated when I see folk doing tasks the hardest way possible when there is a MUCH faster MUCH easier way available...but that includes MYSELF, I hate it when I find out I've been doing something a slow or, forgive me, stupid way! I immediately learn the faster, better way and leave the complex, difficult, multi-step way behind. VLOOKUP is not simple to use. It's slow, and it demands that the data be set up a certain way just for it to bloody WORK ! In Access, it doesn't MATTER. None of this "it must be in the first column" blah blah blah and "it must be in this position"...blah blah blah, none of that, you just have... A TABLE and a QUERY you push one button and you have DUPLICATES end Happy duplicating !!!!!!! peace love and understanding dave :-)

staffordd
staffordd

"Amen! Loads of good info about BOTH applications to be had here. " Yes but part of me thinks I should go away and start an Access thread and let this be an Excel thread as it started out to be. Sure, lots to learn about both. But I've used both, for 13 years, and I hate to say this but for a lot of tasks, particularly...data tasks, data cleansing, data parsing, data duplicates, data unmatched, data searching, data querying, data matching...I must heartily recommend a DATAbase tool... Microsoft Access. A huge part of me just feels that while Excel ***can*** do some of these things, and some of them it does reasonably "well", the relative difficulty is such that these tasks are complicated to set up and execute in Excel, and relatively quick and easy in Access! And whether you will or no, I am telling you the truth! Despite this, I'm sitting here watching a whole new raft of incredibly convoluted, complicated, tortuous A2=A3 formula vlookup insert column when titles match make them all the same etc. new ways to do a database task in an accounting tool. For accountants, and anyone working with financial data, or doing a lot of math - I would recommend Excel every time. I use Excel when I need to do things with NUMBERS. But when I need to work with DATA, I turn to the DATABASE product. Pure logic. NUMBERS = Excel DATA = Access WORDS = Word PRESENTATIONS = Powerpoint PROJECT MANAGEMENT = Project FLOW CHARTS = Visio. Any Microsoft reps want to speak up and tell me if I have got it wrong? Please don't be afraid of Access. If you could just try it, if you could see how EASY it is to perform the same function of a VLOOKUP or an HLOOKUP or finding duplicates, how incredibly quick, simple and easy it is in Access, you'd PROBABLY never use Excel to do those tasks again. I don't. I haven't done a VLOOKUP now for...about two and a half years. Because if I need that kind of query power, I go to where the powerful queries are - Microsoft Access. Microsoft owes me if there is a sudden increase in the purchase of Access licenses :-) Sorry to bang this drum so loudly, but it's really hard to sit and watch this, when I KNOW how much time I could save you all... I love Excel. But I love Access...a lot lot lot more. Fast, easy, powerful. NOT adjectives I would apply to some of the more complex Excel tricks. I don't like needless complexity... Usual disclaimer goes HERE. Have fun! Dave :-)

mark
mark

Good luck!

Marshwiggle
Marshwiggle

I think a point has been missed here. The article was about returning unique lists in Excel, and whether it may be easier in Access is irrelevant. It's also easier in Oracle, SQL Server, and any number of other database applications that, as dan-knight mentioned above, many users don't have skills for, or access (NPI) to, since it's not included in several versions of Office. The article is clearly intended to help users who, for better or worse, are using, and already have their data in, Excel; and to that end, it serves its purpose very well.

mattohare
mattohare

I think you're writing from an application development point of view, not that of a data scrubber. I scrub a lot of data. Excel is a step, and Access doesn't have to be. First, MS has done its best to make Access less usable over the years. Even after turning to multi-window mode in Access 2007, the query and table windows still come out taking the whole parent window and have to be resized. Moving data from Excel to Access and Back can be a headache, when the final destination is SQL Server. Second, One often needs to write a query to do in Access what Excel can be done in a Search & Replace. I can take a page of rather unruly data, and shape it into a good set of normalised tables much more easily than I can in Access. In Access, I'd need to write VBA code for about every third thing I do. This tip gave me an insight into making Excel a better scrubbing tool.

mille383
mille383

Say, what is this Epsilon button and 'Aggregate Query' function that you speak of? I thought I was pretty good with Access but I've never heard of or seen either. But then, we're still on 2000. Could that be why? Please advise. Thanks. R

j_eyon
j_eyon

Access is the right tool if there's an Access programmer around. Otherwise, the right tool is the one the users know how to use.

dan-knight
dan-knight

Couldn't agree more with the post re: Access is the better tool. However, my experience is that the vast majority of Excel users haven't a clue the differences between their Excel lists and a relational database is. They simply don't have the tool (Access - either the actual software or the skills) in their toolkit. In this case, I lend my support, this article is one the better one on TR re: Excel.

Dayle1011
Dayle1011

I have a quick question about the Access component in your post. BTW, thank you, I wasn't offened :o). Anyway, when you are creating an unduplicated or unique list, don't you have to put a Where statement in their like Where and in the criteria row >1? I could be SOO wrong. Thanks

martinshort
martinshort

Dave Likewise. It's been a real pleasure bantering with you so thanks. =============================== On a slightly different note, I was just reviewing our various conversations, and I spotted something that you might be interested in. You said earlier that "I champion the use of the query (which if, they added it into Excel, wow, what a tool that would be!" They have! Excel does come with Microsoft query which looks like a poor man's version of the Access/SQL server query designer. I wouldn't usually bother though as it looks and feels like Access version II; a strong case of arrested development! Access/SQL server is so much better. Also it's a bit clunky as you seem to have to save the query as a separate file from the Excel worksheet. It also doesn't appear to be possible to link direct to an internal sheet, just to external data sources (which could be a different sheet in the same workbook) However it does have its uses. For example, a few days ago a colleague needed to automate and import to Excel the results of an Access query with an ODBC link to a password protected Oracle database. It proved impossible to automate as you couldn't save the Oracle password 2 tube stops away from the source. On a whim, I suggested copying the SQL from Access to the Excel query editor and doing away with the middle man. It worked like a dream.

staffordd
staffordd

"In response... I think you and I are probably not so different :-) " Thanks for this Martin. It's funny you would say that, because that's what you find if you just take the time to understand "why" someone has a different viewpoint. I understand why people use Excel to do database tasks. But...I want to give them an option, a way to do things faster if they want it. That's "why" I push Access so hard, not because I am in love with it or I work for Microsoft (I do neither!!!!) but because of my own frustration with slow, cumbersome, overly complex programs, processes and so on. I want the fastest, best, quickest...but most important to me, MOST ACCURATE programs and processes available, so I don't spend hours doing something that COULD take 15 minutes. But, I would even sacrifice SPEED if a slower process gave me a MORE ACCURATE or MORE DETAILED answer. So even while I value speed, it's not my first target...accuracy, and thence data quality, is. I suspect that you are absolutely right, that we are more similar than it would have at first appeared, and for me that's a good lesson to remember, that it's always good to UNDERSTAND where someone else is coming from before you misjudge them... ...and in the end, we both learned a bit more about why we were both saying what we were saying, and suddenly, communication occurs! If only everyone could be so lucky. If they were, there would be no more misunderstanding...and no more war. Now I am really going off the rails !!!!!! But I can dream. Thanks for saying this, that was a classy thing to do! Cheers mate Dave :-)

martinshort
martinshort

...unless of course you are my boss. The things he does with vlookups. Actually I spend a lot of time working on an IMB iSeries database which is a series of flat files that someone several years ago pretended was relational. I reckon that if you can emulate a relational db in O/S400, then you can do it in Excel too - just not terribly well and open to errors!

martinshort
martinshort

I think you and I are probably not so different :) Cheers Martin

mille383
mille383

Excel is a database. What it is not, never has been, and never will be, is a RELATIONAL database. If the data needs to be worked in any way other than linear, then Access is the right tool. It's all about being able to create and manipulate data relationships.

staffordd
staffordd

"Waving the flag for Excel as a database tool I like Excel and am very comfortable with it. I like Access and am very comfortable with it. Sometimes I like doing database stuff in Excel because I can't be bothered to set up an Access DB to do some ad-hoc data cleansing which - in my humble opinion - I can do quicker in Excel. The key here is the words "ad-hoc". Something more structured or longer lasting, yes by all means go for Access or SQL. Multi-user - definitely Access. In the meantime, let's stop having a go at an excellent application which has been designed (in part) to run as a database; even mimic a relational database. Gripe over! " Hello Martin Short! Excellent post, and even though my posts must seem incredible anti-Excel and pro-Access, if you read them carefully you will see that I praise Excel repeatedly, I point out that it can do a LOT of things faster and better than Access, and so on. However...when it comes to ad hoc - yes of course, there will be VERY SIMPLE data tasks where using Excel is quicker and it's totally NOT WORTH setting up an Access db. I could not agree more. But, I don't think it's fair to intimate that Excel should be used for ad hoc work and Access for more long term work, because before I ever LEARNED Excel, I did ad hoc in Access for several YEARS. And it's quick, easy, and once you gain speed and understanding of how to write queries, it becomes nearly equal, if not equal, to Excel in speed and useability for ad hoc projects of ANY size. I don't know how many times I've gotten data from somewhere, from a tool or a database or whatever, and put it into Excel, did some manipulation, spent time formatting data, etc. only to find that the NEXT time I got the same data, it now exceeded 65000 rows, or I am now required to do massive updates to it, so I end up building an Access database and using that instead, and it turns out I should never have bothered with Excel. That's just the simple truth in my case. You raise a number of excellent points, and I want to say I am really, really NOT having a go at Excel. I am not wholly convinced that Excel really is MEANT to be used as a database, but having said that, I have often used it as such, and the macros in particular are spectacular, and if you know a little vba, you can be super dangerous with Excel macros. In the end though, it boils down to COMFORT. I am unusual in that I was taught Access FIRST, long before I ever had to use Excel, whereas 99.9% of the rest of the world seems to use Excel and never learn Access, or use Excel and THEN learn Access, or whatever, as if the natural transition is Excel then Access. In that case, I would expect (and I see it in the world) a lot of comfort with Excel, and a lot of discomfort with Access. But for me, it's actually the reverse! Part of the reason I champion doing things in Access, and I champion the use of the query (which if, they added it into Excel, wow, what a tool that would be!) is that my skill in Excel is rubbish compared to my skill in Access. I am slow and stupid in Excel. I have to read the help file to do new things. Whereas in Access, I just....GO. So if I have to have something done THAT DAY...it's Access. If I have time to fiddle with it, and it doesn't have to be done THAT DAY...I might try Excel, particularly if I wanted the power of conditional formatting, pivot tables, pivot charts, etc. all of which, I might add, are conspicuously ABSENT in Access! The pivot chart is absolutely amazing, and that's not the only thing. You are fortunate though, because you like both tools, and you are comfortable with both tools. I would guess that the absolute majority of people like Excel and are comfortable with it, but do not like/fear Access, and are NOT comfortable with it. Because of that, I tend to wax profound about the qualites of Access, because I want to break through that fear, and get Excel users to not be afraid of Access, but to embrace it, start to use it, for the tasks that are very cumbersome indeed in Excel. Anyone who is lumbered with a long, complex Excel process might find doing the same process in Access extremely refreshing...but, they have to develop a comfort level with the app and particularly learn how to query well and quickly...which takes TIME to learn. And one thing we often don't have a lot of, is time. But I really wasn't having a go at Excel. It's an excellent app. I've stated elsewhere that I somewhat disapprove of some of the strange ways it gets used, when there is a perfectly good tool being ignored, but it's natural that that would bother me, since for years Access was all I knew or used! I personally wouldn't pigeonhole ad hoc = Excel, longer-lasting = Access...because as these discussions continue to reveal, there IS crossover, it IS viable to use these tools in unique ways for which they maybe weren't intended, and that's OK...it's just not my preference. Thanks for your excellent "gripe". Carry on the good work... Dave :-)

staffordd
staffordd

Hello "Find Duplicates query only returns the duplicates. It sounds like Susan may have wanted to see them in the context of the rest of the result set. (That's often the case with me.)" Dave - well said and fair enough...except, it's quite easy to do an unmatched query as well to see the items NOT in the duplicates query results, and possibly append them together if desired. So - original table, find unmatched query, against the duplicates QUERY (not table) will show you the records NOT in the duplicates query, and if for some reason you really needed to see the results, you could MAKE TABLE of the unmatched, and then append the duplicates to it - which is pointless, as that then would equal the original table. My point being, that there will be/is a quick quick way to do this in Access, whereas finding duplicates, and DOING something about them, in Excel, I find slow, difficult and cumbersome, and if for example I wanted to ... tag the duplicates with the work DUPE for 100,500 records a) I couldn't because it only does 65,000 and b) I would have to do so manually rather than with a 15 second UPDATE QUERY. The power of QBE selection criteria plus the power of the instant update of thousands of records...well... "Also, not everyone has the luxery of working for a firm that will buy the full MS-Office package. " Again, fair enough, but, if you can show them how much time can be saved by, for example, moving a slow, manual Excel process into a pushbutton process in Access including massive updates, then I often find they have enough dosh to fork out for one lonely Access license. You can get them separately and cheap! My .02 as usual...sorry to be such a pain in the arse. Dave :-)

lwilliamson
lwilliamson

I found the Excel solution to be clunky to execute, given the size of the dataset I was using and the number of breakouts, but I'm not familiar with Access and I needed a solution THAT DAY. I was able to identify the issues and alert the consultants I support THAT DAY. Thank you again to the individual who helped me. However, posts to this and other articles have shown me that I will benefit from learning Access and I appreciate your input on this subject.

staffordd
staffordd

"Unique Filter + Count = Success! Thank you - you've made my day! Once I put in the specific range (i.e., A2:A18), it worked like a charm." OF COURSE it works. But...try it in Access now. It's faster, better and you can alter the query ENDLESSLY, without formulas, by typing criteria into the QBE grid, based on any field. Try it in Access. That's all I've been saying, but what I am seeing here is a desperate clinging to Excel, FORCING Excel to do what we are afraid might actually BE quicker and easier in Access. That's what I fear is happening. Forgive me, but it seems as if it's being a bit close-minded, and assuming that Excel is "the only way". It's not! Apologies again if my Access-bias offends anyone at all... The Aggregate Select Query in Microsoft Access - the quickest, easiest way to total, sum, group, count, etc. in the world. Try it - you'll like it. No formulas. No cells. No ranges. One table, one query, you pick the criteria YOU want. Complete power and flexibilty. Did I mention there are no formulas needed? D :-)

lwilliamson
lwilliamson

Thank you - you've made my day! Once I put in the specific range (i.e., A2:A18), it worked like a charm.

staffordd
staffordd

Hi "I think you're writing from an application development point of view, not that of a data scrubber. I scrub a lot of data. Excel is a step, and Access doesn't have to be" Wow that really threw me for a loop. I've been scrubbing, cleansins, de-duping, updating, parsing, splitting, catetgorising, normalising, rationalising, and god only knows what else to data for 13 years now. LOTS of data. I am NOT a developer. I am a data analyst. A "data scrubber" if you will. And, I have used Access for ALL of that time, without ever really learning how to write code. The first job I had involved over a million and a half product records. Excel, with all due respect, falls over at 65000. Search and replace in Access is identically easy in Access as it is in Excel. Access has not gotten more difficult in my opinion, it has got more features and more power, and sure, more advanced functions probably, but I hardly use them - the power of the update query, with it's limitless criteria and the ability to SELECTIVELY update records based on very complex criteria indeed - the power of that update query alone kicks Excel's ass to hell and back in my humble opinion. As far as the complaint about having to resize the windows is petty, and also solvable, there is a way to have Access automatically resize them and position them any way OnDatabaseOpen or some such - but I don't even bother. If it really bothers you, I could re-learn how to do the Resize window for you. But I would suggest that it's not an important point at all. I have watched over the years, people continuing to bastardise Excel, which is a fantastic tool for accountants and mathematicians, but it's a spreadsheet, NOT a database. So many people try to use this spreadsheet as a database...but, it's NOT a database! It's not meant for deduping and updating and scrubbing data...yet, it gets used for that all the time, when Microsoft has provided a dedicated tool for those tasks - Access! Writing queries is SO easy in Access, that if you find it daunting then I think you haven't really taken the time to understand the ease of use and the POWER they have. People keep using Excel as a data tool, for all KINDS of unsuitable tasks, when Microsoft has provided a DATA TOOL, which is a DATA BASE, to work with...DATA. (I see a theme developing here...) Excel is limited to 65000 rows. Access is not. Excel cannot update records with a buttonpush based on complex selection criteria, or even select them with the accuracy and power of Access. Moving from Excel to Access is easy if you know what not to do. But why bother - the clearest path to me is...if you have DATA, and you need to work with that data, to cleanse or update or de-dupe or anything that's not a calculation, you should put that data into a DATA SCRUBBING tool - Microsoft Access. I could argue further that you wouldn't make a flow chart in Word when you have Visio. It would be pointless. I mean sure, you COULD make a flow chart in Word, but it would take forever and look like crap, because Word doesn't have all the fancy templates and arrows and so on. You could also make a PowerPoint style presentation in Word, but again, it would be pointless, look like crap, and would be far quicker and easier in....PowerPoint. So why is that true, but people continue to use Excel in place of Access? I think it's because they think Access is "too difficult" when in fact, to me, some of the features of Excel are extremely tricky and fiddly, when I finally learned how to make pivot tables I was astounded at their strangeness, compared to the equivalent process of creating a table using queries. Strange! Not easy! So it really depends on viewpoint. I find Excel to be slow, overcomplicated, and not transparent. Access is just...rows and columns, tables and queries. Rows and columns, tables and queries... and so on. I will state again, not meaning any offense to anyone, that one should use the right tool for the right job. Ask Microsoft. They would tell you what they intended these applications FOR. What people use them for, will FOREVER baffle me. The same thing applies to the difference between Access and SQL server. If you have 4 million records with 300 columns of data, Access is the WRONG TOOL. Full stop. That needs to be a SQL database. But it's interesting, because of SQL limitations, the best way to work on SQL data...is to export it to Access, run append/crosstab/update queries on it, and then reimport it to SQL. Because Access is fast, but still robust enough to run an update on a million rows if need be. I've never seen Excel do that. SQL can't do update queries either. So in that case, we are down to one choice - Access. But I do really feel a bit...annoyed by your assumption that I am a developer. I am a data person first and last. I worked for Commerce One as a content engineer. I've worked with millions and millions of rows of product data, mostly on identifying and attaching additional data elements to make the legacy data more robust and usable. Or categorisation. Or processes for update. But I've also spent mega time sorting, updating, de-duplicating and cleaning enormous amounts of data. I don't even THINK about the tool. I think about the quality of the data, and what's the quickest way to improve it. Usually, that's a combination of delete queries, append queries and always always always, update queries. Learning to use the QBE grid in Access WELL is the key. If you can find the exact records you need to update, say items that are blue, have a weight between 10,001 and 19,999 pounds, were manufactured in Sweden, between 1999 and 2009, with serial number series beginning with 00017, you can write a select query in Access in well under a minute (I could do that one in less than 15 seconds I reckon) which would return THOSE and only THOSE records. Then change the query design to UPDATE, update the fields/values you want to, which again, takes seconds...total time elapsed, less than two minutes. To update 20000 rows based on very complex criteria. How do you do that in Excel??? Can someone describe in detail the process to identify those same records and update let's say the 20000 rows they represent (so it will actually run in Excel) in Excel??? I am NOT trying to demean Excel, or Excel users, but I will state two things clearly: 1) I am NOT a developer, I am a highly experienced data analyst, cleanser, scrubber, categoriser and updater 2) It's my personal view that one should always use the right tool for the right job, and sometimes people don't Excel is amazing. It's really user friendly, and very powerful for many, many tasks. For me, I only use it when I know it will be faster and better than what I could do in Access. As I said in my original post, there are things that Excel CAN do better, faster and more efficiently than Access. So for those tasks, I choose Excel, every time. But if that is true, then the reverse must also be true. If Excel doesn't have easy to use de-dupe tools, then I will ALWAYS ALWAYS choose a find duplicates query wizard in Access. Of course nowadays, for duplicate files, I actually would recommend a separate app - in my case I've chose NoClone, which I use to analyse, report, and dedupe several hundred gigabytes of data across 18 large virtual servers. But of course, when I extract a file of duplicated items using NoClone, the first thing I have to do is...bring them into Access so I can do some analysis. And, afraid there are usually more than 65000 so Excel would again, not be a viable choice. Hmmm...another recurring theme, examples where Excel simply can't be used because it's stops working at 65000 records. That limitation alone put me off Excel 13 years ago, and I dove into Access and never looked back. I think Access is misunderstood, feared needlessly, people assume it's really difficult when to me it's barely more difficult than Excel, and so on. Stop being afraid of it, and just...TRY it. If you really did, you would stop using Excel for these data cleaning tasks. I am quite sure of that. So after 13 years of Access faithfully supporting me and helping me in my work, not to mention putting food on the table, I am afraid I feel I have to defend it, and to point to it's power and ability, when I see Excel being used, in my opinion wrongly, in it's place. Give Access a chance ! And finally...the idea that you can normalise data faster and better in Excel...I have normalised and rationalised millions of rows of data, and I've NEVER considered Excel for the job. Just knowing how to use three simple functions in Access, MidString, RightString and LeftString, you can locate and update unnormalised entries and repair them WHOLESALE using simple update queries, that take seconds to write and even less time to run. Normalisation and rationalisation are far quicker, easier and more accurate in Access. If you don't want to believe me, ASK MICROSOFT. They would say the same thing. If I were anywhere near where you are, I could SHOW you what I mean, but for now, I would ask that you take my word for it - Access is THE normalisation tool, because of the extreme power of the Update Query coupled with using regular expressions in the QBE grid to carefully and accurately select the records to be updated. Excel has no equivalent that I am aware of. Apologies if I have upset or offended ANYONE, this is NOT my intent, my intent is to defend the honour of Access, which I feel is woefully misunderstood and under used. And if Excel works for you - use it! I would only just ask you to CONSIDER looking at what Access can do, and I feel that if you did, in time, with just a little experience, you would agree with EVERY word in this post (even if you utterly disagree right now!). All the best~~~~~~~~~~~~~ dave

staffordd
staffordd

"However, I am now motivated to actively make use of this handy little feature wherever and whenever I can. Thanks for the inspiration!" The Aggregate Query is an incredibly quick way to get some "quick stats" for a table. You can learn a lot just by counting and summing. One thing I tended to always do (back when I used Access a LOT) is I would run a group by, then change it to a make table and make a table of the results...then run MORE aggregate queries against the GROUPED table to sum it or whatever you needed to do. So in most databases, I would have the main data table, and then another table with the same name + "Grouped" at the end - so I could either query the ENTIRE set, or just the grouped attributes - very handy, very many times. But I'm glad it's all coming back to you. Whereas I am busy forgetting it, since I don't use Access that much compared to how I used to use it... Have fun! Sigma all day long... Dave :-)

mille383
mille383

Once I got the icon figured out (Epsilon, Sigma - it's all Greek to me :-) and actually used it, a memory was sparked from deep within my swiss cheese brain. But it's the old 'use it or lose it' adage. Since I didn't make a habit of using this function way back when, I?d forgotten about it. However, I am now motivated to actively make use of this handy little feature wherever and whenever I can. Thanks for the inspiration!

staffordd
staffordd

"Say, what is this Epsilon button and 'Aggregate Query' function that you speak of? I thought I was pretty good with Access but I've never heard of or seen either. But then, we're still on 2000. Could that be why? Please advise. Thanks. R" Hi, apologies, I didn't actually see this exact post the first time around, I think I answered this in that I made an error in my post, I said "Epsilon" when I meant "Sigma". So I hope it's clear that when you start with a select query in design mode, and then push the Sigma button, it changes the query to an aggregate query, which then allows you to use Group By, Count, Sum, Total, Average and so on...very powerful query, I have used it thousands of time to get quick stats. You can tell a LOT about data just by Group By and Count. And it has lots of other functions which we've touched on such as Where (to add conditions) or Expression (to add fields into the results without acting on them) and so on. If still unclear, please say - and sorry, my Greek is obviously RUBBISH ! Cheers, Dave :-)

staffordd
staffordd

"its the SIGMA symbol on the 2000 and 2003 toolbar the view dropdown (which yoiu may need to expand) while in the query design shows "totals" and toolbar button shows a sigma symbol with the same results. Both change the query to an "totals" aggregate query." It's my fault, I said "Epsilon" when I meant SIGMA !!! It's always been SIGMA. That changes the query view from normal/select to aggregate/select, which gives you the totals/sums/counts/ etc options. My apologies for the confusion! Dave

tballky
tballky

the view dropdown (which yoiu may need to expand) while in the query design shows "totals" and toolbar button shows a sigma symbol with the same results. Both change the query to an "totals" aggregate query.

ssharkins
ssharkins

Guys... the functionality doesn't rule the application, the business does. Just because Access can easily and quickly create a unique list is not a reason to drag an Excel spreadsheet into Access. If Excel is the right tool for the data, dumping that data into Access just because Access can handle a unique list easier doesn't make any sense to me. Nor does the need for a unique list suggest to me that Excel isn't the right tool for the data. I guess I'm missing some important point, but I don't get it -- and my speciality is Access, not Excel, so I'm not prejudice toward Excel.

staffordd
staffordd

"Excelling Access Couldn't agree more with the post re: Access is the better tool. However, my experience is that the vast majority of Excel users haven't a clue the differences between their Excel lists and a relational database is. They simply don't have the tool (Access - either the actual software or the skills) in their toolkit. In this case, I lend my support, this article is one the better one on TR re: Excel." Dan...thanks. I am no longer alone :-) This is not the users fault. For some reason, everyone learns Excel, and then tries to use it for EVERY task, and Access is considered "too difficult"...when it really, really isn't. It's the only relational database that IS user friendly, easy to learn, and incredibly powerful. I'm starting to think I should quit my job and open a school for frustrated Excel users - and teach them Access. I know once I learned it, there was NO looking back. Ever. Thanks for your kind words. I am actually quite passionate (as you will have seen from my last tirade!) about this topic, I would like to see people BENEFIT from the power of Access, and leave the limitations (and misuse of) Excel behind. Some will see and some will refuse to see. But we can only do what we can do, I love to help people, but if they don't want help you can't force them! Have a great day, and it's so nice to have a supporting post, I really appreciate it - sometimes you begin to think the whole world has gone mad, but then you find that someone thinks the same way you do, and it's fantastic! So thanks for that. AND...the article was GREAT ! I only just wanted to say, there are other ways to skin this data "cat"... Dave :-)

staffordd
staffordd

"I have a quick question about the Access component in your post. BTW, thank you, I wasn't offened :o). Anyway, when you are creating an unduplicated or unique list, don't you have to put a Where statement in their like Where and in the criteria row >1? I could be SOO wrong." So sorry it's taken me so long to reply to this. The answer is...maybe. You can add a where clause, you can add an expression, like a normal select query, the aggregate query can be endlessly customised in a million ways. The most basic is what I described, ONE field in the grid, hit SIGMA, and run the query with the GROUP BY. That groups the contents of THAT FIELD into a unique set. But, if you DESIRE, you could bring down another field, say COLOUR, and run the same query with WHERE selected under COLOUR, and then the criteria "BLUE" underneath - then the query will return the GROUPED/UNIQUE values in your original field ONLY WHERE Colour = "Blue". If you have a value you want to SEE in the query, but not have the query "do" anything with, use "Expression" - that would add in a third field for example, but it wouldn't total or sum or count or group or anything, it would just...be. Sigh. I HOPE this is clear...if it isn't, please let me know! Good luck have fun keep GOING!!!!!!!! Dave :-)