Software

How to use built-in Excel features to find duplicates

This comprehensive review shows various ways to use advanced filtering options and conditional formatting to locate duplicates in Excel.

hero

Image: iStockphoto.com/STILLFX



Duplicate values aren't bad. In fact, most are necessary. However, duplicate records can skew reporting and analysis. Whether you're finding duplicates in a single column or looking for duplicate records, Excel can do most of the work for you. In this article, I'll show you easy ways to find duplicates by applying advanced filtering options and conditional formatting rules. First, we'll define the term duplicate—it isn't ambiguous, but context determines its meaning. Then, we'll use Excel's built-in features to find duplicates.

I'm using Excel 2016 on a Windows 10 64-bit system. You can apply the same techniques to earlier Ribbon versions of Excel. You can work with your own data or download the demonstration .xlsx file.

Defining the term duplicate

A duplicate can be confined to a single column; any value that occurs more than once in that column is a duplicate. In an Excel data set, duplicates in a single column are common and seldom incorrect, but they can be. The term can also refer to multiple columns or even all columns for a given record. For instance, a student database might have a date and grade column where lots of values occur more than once in both columns. However, two records that apply the same assignment, date, and grade to the same student might be considered a duplicate and if so, should be accommodated (or deleted) to avoid incorrect analysis and reporting. Your definition of duplicate will depend on the business rule you're applying.

SEE: Windows 10 Anniversary Update: Watch out for these nasty surprises

Filtering out duplicates

Deleting might be the primary reason for finding duplicates, but that won't always be the case. Sometimes, you might want to work around them. If so, Excel's Filter feature can help. It won't find duplicates—filtering doesn't work that way. However, you can use the feature's more advanced options to hide a duplicate so you can work with a unique set of records. You can filter the records in-place or you can copy a set of unique records to another range.

Let's begin by filtering the data set shown in Figure A in-place. Some of the column values are duplicates, which is okay. If you look closely, you'll also see that an entire record is duplicated; the records in row 4 and 11 are the same. Now, let's filter out the duplicate record as follows:

  1. Click anywhere inside the data set and then click the Data tab.
  2. In the Sort & Filter group, click Advanced. The resulting dialog will suggest the range. If you click a single cell before engaging the feature, Excel defaults to the contiguous content to determine this range.
  3. Check the Unique Records Only option (Figure B) and click OK. As you can see in Figure C, row 11 is now hidden.

Figure A

Figure A
You can hide duplicate records.

Figure B

Figure B
Choose the Unique Records Only option.

Figure C

Figure C
Excel hides the duplicate record in row 11.

In this case, you're hiding duplicates records so you can temporarily work with a unique data set. You could copy a unique set to another location and work with it, independent of the source data set as follows:

  1. Click anywhere inside the data set and then click the Data tab.
  2. In the Sort & Filter group, click Advanced.
  3. In the resulting dialog, click Copy To Another Location.
  4. Check the Unique Records Only option.
  5. Identify the copy range by clicking inside Copy To Range and then clicking an anchor cell in the sheet (Figure D) or by entering a range reference.
  6. Click OK. The resulting data set, shown in Figure E, isn't hiding a record; that record wasn't copied.

Figure D

Figure D
Identify the copy range.

Figure E

Figure E
The copied data set doesn't include the duplicate record in row 11.

You can't use the advanced options to hide duplicate values in a column, but you probably wouldn't want to anyway. This feature always evaluates the entire record. If you're working with a Table object that includes a totals row, or you're using a SUBTOTAL() function to evaluate columns (such as Commission), the function will evaluate only the visible records.

SEE: Microsoft Stream: The future of secure business video

Easy formatting for duplicates

Perhaps the easiest way to find a duplicate value is to let a conditional formatting rule highlight it. Fortunately, there's a built-in rule that's easy to apply. In addition, you can identify single-column duplicates and whole records.

First, let's find the duplicate values in the Region column as follows:

  1. Select the values E4:E14 and then click Conditional Formatting in the Styles group.
  2. From the dropdown list, choose Highlight Cells Rules and then choose Duplicate Values from the submenu (Figure F).
  3. In the resulting dialog, you can change the default formatting scheme if you like, but for this example, click OK without changing the default settings. Figure G shows the results—only one region value, Central, isn't repeated.

Figure F

Figure F
Select the built-in conditional rule.

Figure G

Figure G
Only one region isn't repeated.

To highlight a duplicate record, as shown in Figure H, repeat the process of selecting the entire data set (B4:F14) in step 1. Although you can spot the duplicate records, this choice finds all duplicates, not only duplicate records.

Figure H

Figure H
You can find duplicate records.

Using another built-in rule, you can highlight unique values. We won't go through an entire exercise, but you should know about the rule. Select a range (or a column) and click Conditional Formatting in the Styles group (on the Home tab). Choose New Rule and in the resulting pane, select Format Only Unique Or Duplicate Values in the top pane. In the lower pane, choose Unique from the Format All dropdown, as shown in Figure I, and continue as you normally would to create a conditional rule format.

Figure I

Figure I
You can also highlight unique records using a built-in rule.

Formatting only the duplicate records

Nothing we've yet seen has highlighted only the duplicate records. To the best of my knowledge, there's no quick built-in rule for doing so. You could work harder than necessary to come up with a formula that handles this, but if you're familiar with my work, you know what I'm going to say next—let's use a helper column instead!

We can concatenate all the values to create a string for each record and then use a simple counting function to count them. Trust me, it'll be much easier than trying to do it all with a conditional rule.

The first step is to concatenate the columnar values by entering the following expression into G4 and copying it to the remaining rows:

=CONCAT(B4:F4)

For the sake of convenience, I put the concatenated results next to the data set, but you can put them in an out-of-the-way spot where no one sees them. (Don't worry that the results display each date's serial value instead of its formatted value—it doesn't matter.) The next step is to add the conditional formatting rule as follows:

  1. Select the data set, B4:F14; don't include the concatenated strings in column G.
  2. Click Conditional Formatting in the Styles group on the Home tab.
  3. In the resulting dropdown, choose New Rule.
  4. In the top pane, select Use A Formula To Determine Which Cells To Format option.
  5. In the lower pane, enter the formula: =COUNTIF($G$4:$G$14,$G4)>1
  6. Click Format, click the Fill tab, choose a color, and click OK. Figure J shows the rule and the format in place.
  7. Click OK to see the results shown in Figure K.

Figure J

Figure J
Count the concatenated strings.

Figure K

Figure K
This simple conditional formatting rule highlights duplicate records.

The COUNTIF() function returns true if the concatenated string in the corresponding cell occurs more than once. When the result is true, the conditional formatting feature applies the specified format. If the counting function returns false, Excel doesn't apply any formatting. We could do this without the concatenated strings in column G. But we'd have to account for all the columns, and the resulting formulaic rule would be horribly complex to enter and maintain. When using this technique for your own work, be sure to apply the absolute and relative references as shown.

Now there's one more possibility we should consider. You might not want to highlight the first occurrence of the duplicate record as the technique above does. You might want to highlight the second and any subsequent duplicates. Using the helper column technique, this requirement is easily accommodated.

First, add a counting function to the data set by entering the following expression in I4 and copying it to the remaining records:

=IF(COUNTIF($G$4:G4,$G4)>1,"Duplicate")

The expression returns the string "Duplicate" for only one—record 11. Now, we can apply a conditional formatting rule that highlights only this record. If you like, you can delete the first conditional rule before adding the second. Either way, you can add the second rule as follows:

  1. Repeat steps 1 through 4 above.
  2. In the lower pane, enter the formula: =$H4="Duplicate"
  3. Click Format, click the Fill tab, choose a color, and click OK. Figure L shows the rule and the format.
  4. Click OK to see the results shown in Figure M.

Figure L

Figure L
Enter a rule that ignores the first occurrence of the duplicate record.

Figure M

Figure M
The new rule highlights only the record in row 11.

If you need to filter the record, use the formulas and then filter out the duplicates by filtering for "Duplicate" in column H.

Send me your question about Office

I answer readers' questions when I can, but there's no guarantee. Don't send files unless requested; initial requests for help that arrive with attached files will be deleted unread. You can send screenshots of your data to help clarify your question. When contacting me, be as specific as possible. For example, "Please troubleshoot my workbook and fix what's wrong" probably won't get a response, but "Can you tell me why this formula isn't returning the expected results?" might. Please mention the app and version that you're using. I'm not reimbursed by TechRepublic for my time or expertise when helping readers, nor do I ask for a fee from readers I help. You can contact me at susansalesharkins@gmail.com.

More Office how-to's

About Susan Harkins

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

Editor's Picks

Free Newsletters, In your Inbox