Software

Use formulas to find multi-column duplicates in Excel

Duplicates in the same column are easy to find by sorting or filtering. Duplicates that span multiple columns require a bit of setup, but the solution's not difficult to implement.
Finding duplicate values in the same column is easy; you can sort or apply a filter depending on the circumstances. Finding duplicates that span multiple columns is a tad more difficult. A sort can work, but then you have to find the duplicate values. So while it's better than no solution at all, it's not a good solution. You probably want a solution that kind of screams out at you Here I am! I'm a duplicate! Let's take a quick look at a simple example. The sheet below contains a column of dates and a column of initials. A few dates are repeated and a few initial sets are repeated; they represent duplicates within those columns. However, we're interested in records that repeat the same date and the same initials. That's what I mean by a multi-column duplicate. It's fairly easy to spot the duplicates, rows 8 and 9, in such a simple sheet, but what if you had hundreds or thousands of rows to check? An AutoFilter will work, but it's a vulnerable solution. In this case, there are five distinct dates. That means a user has to review at least five sets of records to find duplicates. Even then, you have to trust your user to actually spot them. It's not a good solution, especially if you have lots of data. You might try an advanced filter or even conditional formatting, but both would require so much hoop-jumping that I'm not sure the end result would be worth the fuss, when compared to the easy-to-implement solution that follows:
  • Concatenate the columns you're checking.
  • Use CountIf() to count the number of combined values.
Let's enhance the sheet above to see how this works:

  1. In cell C2 enter the formula =A2&B2. (You can combine more columns.)
  2. Copy the formula to C3:C9. Excel uses each date's serial value, but that won't interfere with the technique. However,  if your values contain times, it might, depending on how the time values were entered.

  1. In cell D2 enter the following formula: =IF(COUNTIF($C$2:C2,C2)>1, "Here I am! I'm a duplicate!","Original")
  2. Copy the formula to D3:D9. At this point, finding multi-column duplicates is as easy as sorting by column D (although this example doesn't require any sorting).

The IfCount() function counts the number of times the concatenated values occur within the extending range. If the count is greater than 1, the formula returns the string "Here I am! I'm a duplicate!"; when the count isn't greater than 1, the formula returns the string "Original." Only the first occurrence will be identified as Original. This technique easily adapts to include additional columns. Simply add each column to the concatenating formula (column C in this example). Of course, there are other ways to identify multi-column duplicates in Excel. If you'd like to share a more efficient solution or a more complex problem, please start a conversation below.

About

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

13 comments
vijaybhatt143
vijaybhatt143

i need all values for duplicate  value in other sheet plzz help me for this

Candace.1221
Candace.1221

We complete our staff roster on excel. One problem we have is that we have to enter in all shifts manually. Sometimes this causes one particular shift to be give to multiple staff causing double ups. We have staff names (rows) and dates (columns) as headings. We need to find mutiple shifts in the columns to stop the double up of shifts. Can anyone suggest a way to find double shifts within one column?

ocelot1
ocelot1

Hello, I would like some advice on how to remove duplicates from 2 different columns, but without removing duplicates contained in the individual columns themselves (if that makes sense - as quite often we have more than 1 copy of the same title, but for this purpose - they are not 'duplicates'!) Basically my data is for Sales and I want to be able to find the ones that haven't sold by matching the ISBN's from the orders, with the inventory data. Any help would be much appreciated. Please let me know if you need more information/clarity! Thanks, Julie

TerryRay
TerryRay

Could someone explain what does the part of the expression in this solution($C$2:C2) means?

DuhGreek
DuhGreek

Excel?s Advanced Filter will display unique multi-column values. First select the range to filter (or it will auto-select), then select "Data | Filter | Advanced Filter...". Be sure to check "Unique records only". From there you can filter in-place or copy the filtered results to a different location. - If copied to a new location, the duplicates are automatically removed. - If filtered in-place, you can mark the displayed rows by copying a tag using "Edit | Go To... | Special... | Visible cells only". Untagged rows contain duplicates. (Don't forget to "Data | Filter | Show all" to return to the unfiltered results.)

Arcturus16a
Arcturus16a

One might consider using a Pivot Table to search for duplicates. Place the Date across the Columns, the Rep down the Rows and Count the Dates as Values. Select the field with fewer results for the columns when Pivoting large data sets so it'll be easier to read. Alternatively, one could COUNTIF the CONCATENATEd results and the use the Report Filter in the Pivot Table to display only results greater than one.

carlsburger
carlsburger

Thanks for the tip Susan! Until 2007 there were lots of ways to find duplicates in *one* column, much harder in multiples. Just wanted to add a couple of things I've learned about dealing with duplicates in Excel 2007. At first glance, the Remove Duplicates button (Data tab) looks like the answer to your prayers. The thing to note is it is well and truly REMOVE duplicates. Not SHOW duplicates. Not LET ME LOOK at duplicates and DECIDE if I want to delete them or not! As most of our users prefer to do this, we are finding the Conditional Formatting more useful - under Highlight Cells you can choose to Highlight Duplicates with a font or background colour (or both). You can then use Sort or Filter to view by Colour. This effectively groups duplicates together and allows you to decide their fate! Just thought I'd share - when we first moved to 2007 I was seduced by the snazzy new Remove Duplicates button and only discovered the Conditional Formatting method later on.

RonaldH71
RonaldH71

I use this technique a lot, and I don't see an easier solution than this. But there is one possible flaw in this: If you have in one row an empty cell and in the next cell for example an "A", and in another row that first cell is an "A" and the next cell empty, it returns as a duplicate while it isn't. For that reason, I always add an extra marker for separating columns, something in the range of ~,@,#,^ or _. At least something it would be most unlikely to appear in your sheet.

DuhGreek
DuhGreek

When copying a formula to a range of cells, sometimes you want the referenced cells to be "locked" and sometimes you want them to move with the copy-to cell. The "$" notation locks the reference, making it absolute. Without the "$", the reference would change relative to the copy-to cell. So, for the cells in column D, the range being assessed could be expressed as "always starting with C2 ($C$2) and ending with the cell immediately to the left (C2 for D2, C3 for D3, etc.) You can independently mix locks for columns and/or rows: $C$2, C$2, $C2 and C2. (When editing formulas, you can easily change this by pressing the F4-ey while the cursor is touching a cell reference.)

libskrap
libskrap

Just in case you need it, the same methods are available in 2003 version.

smankinson
smankinson

Is there a best resource for the presentation side of EXCEL? I have used spreadsheets for years for analysis. I am quite comfortable in saying that this has been a great tool for covering the bases and for cross-checking. However, there is always someone that wants said spreadsheet to be more understandable. I know I am being generic but I'd take ideas for resources.

williams
williams

Good tip!. I was using =IF(AND(A2=A1,B2=B1 etc),"Duplicate","original") but this requires the data to be sorted.

TerryRay
TerryRay

Thank you for the explanation of this part of the formula. Now it all makes sense.