Detect and Remove Duplicate Values in Excel

Let's return to Microsoft Excel articles. We have previously written the following articles in Microsoft Excel.

Introduction

This time we are back with a minor but an important feature of Microsoft Excel. When we have hundreds of records in our sheet, there might be duplicate records too. Mainly a sheet is created using teamwork and by merging multiple sheets.

Here we will simplify the process with two steps. First we will use a built-in tool to detect duplicate records and then use other tools that are useful to delete the duplicate records.

Let's have a look at the following screenshot that is a file of master data of drivers on a transport company.

Detecting Duplicates

Detecting Duplicate Values
Figure 1:
Detecting Duplicate Values

In the preceding screen, we have highlighted a few records that are duplicates, but whereas this file has more than 4000 rows, it will take lots of time to detect duplicates one by one. So, we will use Microsoft Excel's tool called conditional formatting. This tool can format all records that have duplicate values with a separate fill and color that helps us to filter or sort using color and identify separately.

Detect Duplicate
Figure 2: Detect Duplicate

In the preceding screen, we have used the following procedure:

  • First we selected the column to be matched for duplicate content. Here we have selected the Name field.
  • Then from the Home tab, we have selected the Conditional Formatting command.
  • When a complete list of Conditional Formatting commands appear, we hovered over Highlight Cells Rules and choose Duplicate Values from the bottom-most option.

Now let's see what we got here.

Duplicate records highlighted with Live Preview
Figure 3: Duplicate records highlighted with Live Preview

After selecting the color combination from the preceding dialogue, we have clicked OK in the dialogue screen that is not visible in the preceding because of expanded dropdown values.

Now you can compare that was to be kept in case you want and rename that, else leave it as it is. If you sort your sheet with color, now you can see all duplicate values in one place. Now the another part of job, to delete the duplicate records.

Deleting Duplicate Records

We have now identified and corrected the duplicate records in the sheet. Now the next step is to clear the unwanted duplicate records. Let's have a look at the following screen.

Deleting Duplicate Records
Figure 4: Deleting Duplicate Records

Now, in the preceding screen, we clicked in the Data Tab to list its commands among which Remove Duplicates is one and is required in this guide. We are already inside the sheet, so there is no need to select any other rows or columns. Now just click on the Remove Duplicates button, this will produce a dialogue that will have all the column titles and a few useful commands. One of them is My Data that has headers that will exclude the header row from matching.

Now the most important and critical is to select only the column to be validated and expected to be unique. I suggest, only one column that is set to be unique to check and click on the OK button at the end of the dialogue.

Deleted confirmation with records count
Figure 5: Deleted confirmation with records count

The procedure is finished, your data is now junk free, you managed to clean up the duplicate records within a few clicks. The first step specified in our article is about detecting the duplicate records but is not related to removal of duplicate values, but we showed here for better illustration and safety of your data. We also recommend you delete the data after copying it to a a separate sheet so that in case of future need, you can compare what is deleted and recover lost data.

Thanks for the reading. Do not forget to share your views.