Software

Office Q&A: Consider a threshold when counting duplicate values in Excel

You can use a simple COUNTIF() function to count duplicates conditionally. This illustrated walk-through demonstrates the process.

hero

The word conditionally can seem like a serious monkey wrench when it comes to applying business rules—you can get all kinds of unique requests! For instance, Zahra wants to count duplicates, but not every duplicate. Rather, Zahra needs to know how many times a value is repeated over two instances. In other words, if a value occurs five times, the returned value Zahra needs to see is three. Articulating the condition can be difficult, but fortunately, applying it doesn't have to be.

I'm using Excel 2016 on a Windows 10 64-bit system, but this solution will work in most versions of Excel. You can try it out with your own data or download the demonstration .xlsx or .xls file.

SEE: Microsoft Office 365: The smart person's guide

The setup

Using the data set shown in column C of Figure A, a COUNTIF() expression returns the results Zahra needs. The matrix to the right isn't necessary, but it does provide a simple count that verifies the expression in column D's results. In addition, the complete solution does a bit more than Zahra requested. We'll explain it all as we go along.

Figure A

2016112a.jpg
Count duplicate values conditionally.

In the next section, you'll learn how to create a unique list of values and use Excel's Transpose feature to create the matrix header row. If you don't want to bother with the matrix, you can skip the next section and move straight to "The Solution" below.

The matrix

The first step in creating the matrix is to generate a unique list of values using an advanced filter as follows:

  1. Select the data; in this case, that's C2:C16.
  2. Click the Data tab and then click Advanced in the Sort & Filter group.
  3. Click the Copy To Another Location option.
  4. Click inside the Copy To control and then click a nearby cell in the sheet.
  5. Check the Unique Records Only option (Figure B) and click OK. Figure C shows the resulting list of unique values.

Figure B

2016112b.jpg
Use an advanced filter.

Figure C

2016112c.jpg
The advanced filter generates a unique list.

Next, use the Transpose feature to copy the unique values across row 3 to create a row of header values for the matrix as follows:

  1. Select E4:E9. (I've purposed excluded E3—you'll see why in a minute.)
  2. Press [Ctrl]+C to copy the selected values to the Clipboard.
  3. Select F3 (which will become the second cell in the matrix header).
  4. Click the Home tab. In the Clipboard group, click the Paste option's dropdown and choose Transpose (Figure D).
  5. Delete the values in E4:E8.

Figure D

2016112d.jpg
We'll use the transposed values as a header row.

There are seven unique values in the data set. With the matrix labels in place (E3:K3), let's add a COUNTIF() function to count the number of times each header value occurs in the data set in column C. In E4, enter the function

=COUNTIF($C$3:$C$16,E$3)

and copy it to the remaining matrix. Be sure to use the correct relational referencing:

  • $C$3:$C$16 refers to the fixed data set.
  • E$3 references the header values in row 3 that compose the matrix shown in Figure E.

Figure E

2016112e.jpg
The matrix returns a simple duplicate count.

The solution

At this point, subtracting 2 is all that's needed, but the matrix is unnecessary. Zahra prefers to display the results next to the data values (see Figure A). To do so, enter the following expression in D3 and then copy it to the remaining cells in the matrix as shown in Figure F:

=COUNTIF($C$3:$C$16,$C3)-2

Figure F

2016112f.jpg
Subtract 2 from the simple count.

Notice that I updated the criteria argument—it's now $C3, which references the data set instead of the matrix header values. In addition, -2 subtracts 2 from the simple count we saw earlier in the matrix.

Enhancements

The results, as is, might be a bit misleading: The expression returns -1 when the value is unique, which isn't representative of the expression's purpose. The value can't be duplicated -1 times. It might not matter, but if it does, you can add an IF() function to return a more meaningful value. Replace the expression in column D with the following expression:

=IF(COUNTIF($C$3:$C$16,$C3)-2>0,COUNTIF($C$3:$C$16,$C3)-2,0)

The results, shown in Figure G, are more meaningful to Zahra:

  • 8987321803 isn't repeated more than twice
  • 8971760033 isn't repeated more than twice
  • 8971759241 is repeated three times more than twice
  • 8950420032 isn't repeated more than twice
  • 8950366051 isn't repeated more than twice
  • 8950243864 is repeated one time more than twice

Figure G

2016112g.jpg
Use an IF() function to display a more meaningful value.

As mentioned, the matrix isn't necessary at all. You might prefer that structure—or you might not. In addition, the threshold of two could change. As soon as you reference -2 in the actual expression, someone will change the rule. Fortunately, you can easily adapt this solution to be more dynamic by adding an input cell for the duplicate threshold and referencing that value instead of using the literal -2 component in the expression.

To accommodate this possibility, I added a new row for the input cell. Then, I replaced the literal -2 threshold value in the column D expression with an absolute reference to the input cell ($D$1):

=IF(COUNTIF($C$4:$C$17,$C4)-$D$1>0,COUNTIF($C$4:$C$17,$C4)-$D$1,0)

The logic remains the same, as you can see in Figure H, but now the expression accommodates any duplicate threshold value.

Figure H

2016112h.jpg
The modified expression accommodates any threshold value.

Send me your question about Office

I answer readers' questions when I can, but there's no guarantee. Don't send files unless requested; initial requests for help that arrive with attached files will be deleted unread. You can send screenshots of your data to help clarify your question. When contacting me, be as specific as possible. For example, "Please troubleshoot my workbook and fix what's wrong" probably won't get a response, but "Can you tell me why this formula isn't returning the expected results?" might. Please mention the app and version that you're using. I'm not reimbursed by TechRepublic for my time or expertise when helping readers, nor do I ask for a fee from readers I help. You can contact me at susansalesharkins@gmail.com.

About Susan Harkins

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

Editor's Picks

Free Newsletters, In your Inbox