Software

Let Excel 2007's new Remove Duplicates feature find unique values in your data

Avoid redundancy with Excel 2007's new Remove Duplicates feature and easily produce a list of unique values for more than one field.

It is the end of the month and you need to write a quick memo to report on last month's sales. You have a spreadsheet listing 100 invoices for the month, including the Invoice Number in Column A, the Customer Name in Column B, and the Product Name in Column C. Because most of your customers place orders several times a month, you need a quick way to generate a listing of each customer that ordered from you last month.

Before Excel 2007, you would obtain this data using either Excel's Advanced Filter or Pivot Table feature. With Excel 2007's new Remove Duplicates feature, you have an additional, even quicker, way to produce this list. Follow these steps:

  1. Select the Customer Name column.
  2. Copy the data to a blank section of the worksheet. (You should leave at least one blank column between the original data and the copied data.)
  3. Click inside any cell within the copied cells.
  4. Click the Data tab.
  5. In the Data Tools group, choose Remove Duplicates, then click OK.

Excel removes all duplicate customer names from the copied list. You can also use Remove Duplicates to produce a list of unique values for more than one field. For example, to generate a list of customers and the products they ordered, follow these steps:

  1. Select the Customer Name and Product Name columns.
  2. Copy the selected data to a blank area of the worksheet.
  3. Click inside any cell within the copied cells.
  4. Click the Data tab.
  5. In the Data Tools group, choose Remove Duplicates, then click OK.

Excel generates a list of each customer and the products ordered, excluding duplicate orders for the same product from the same customer. You can then use the Sort function to make the list easier to read by following these steps:

  1. Click any one of the cells in the Remove Duplicates results list.
  2. Click the Data tab.
  3. Click the Sort button.
  4. Click the My Data Has Headers check box.
  5. Click the Sort By drop-down arrow and select Customer Name, then click OK.

Miss an Excel tip?

Check out the Microsoft Excel archive, and catch up on other Excel tips.

Help users increase productivity by automatically signing up for TechRepublic's free Microsoft Office Suite newsletter, featuring Word, Excel, and Access tips, delivered each Wednesday.

4 comments
lloydb
lloydb

Is there a way to use the remove duplicates feature to identify where the duplicates occured? Consider a large data set where the duplicates (based on the contents of one or more fields, such as given name and surname) have been entered "in error". The task is to create a list of those records which are duplicates. One alternative is to sort the data set on the fields in question, then manually scroll through the list looking for the repeats. There must be a better way??

alpuod
alpuod

You can identify duplicates using conditional formatting to format the duplicates with a fill color. Then you can filter the table by color to review the duplicates before deleting them.

ssharkins
ssharkins

There should be an easier, more efficient way that's less prone to error. I'll see what I can find out.

beegkitty
beegkitty

So, when I want to find duplicates in my excel files, I use the following: This requires that the data be sorted by the field that you want to check for duplicates as well as having a header row. Say the column is titled "Email". Sort your data by the "Email" column. Insert a new column directly next to the "Email" column. Name it "Duplicate". Note: This assumes that the "Email" column is the A column and the "Duplicate" column is the B column. In B2, use the following formula: =if(A2=A1, "Duplicate", "Unique") Fill down to the end of the list - so you should see =if(A3=A2, "Duplicate", "Unique") etc. Select the "Duplicate" column. Copy then paste special->Values. Select the column again, go to Format->Conditional Formatting. Set it to "Cell Value is equal to Duplicate. Then select the format to set the pattern to a bright color. I do formatting part just for a visual reference. But, once you have the data in the cells, you can sort by the duplicate column and find all of the duplicate records easily.