How To Find Duplicate Records In SharePoint List

During one of my assignments, I have come across a situation where we need to fix the data issues in SharePoint Lists. One of the issues that we found was the presence of duplicate data. In order to fix that problem, I developed a Powershell Script to find out the duplicate data based on a specific or a group of columns.

For the sake of demo, I have added a SharePoint List with some duplicate records in it, as shown below.

1

Now, let’s look into the code to understand implementation details-

In Step 1, we are getting references to the Site and Web where the SharePoint List resides

In Step 2, we are splitting the list of columns based on which we want to find out the duplicate data.

We can see there are two input variables “ColumnToValidate” and “ColumnToDisplay”. “ColumnToValidate” provides columns based on which duplicity needs to be checked while “ColumnToDisplay” contains the list of columns that need to be the part of the data export.

In Step 3, we are creating the export folder that will hold the CSV files exported with duplicate records.

In Step 4, we are creating the list object that will give the handle on the list which needs to be validated.

2

In Step 5, we are getting the list of Items from SharePoint List and grouping them based on the validation columns.

In Step 6, we are creating the directory for the export files.

In Step 7, we are exporting all the groups which are having item count greater than 1 (this logic identifies the duplicate items).

3

That is all for the code. Now, we will see the variation in outputs depending on the columns specified for duplicacy-check.

In Step 8, we specify the validation and display columns. For the first execution, we will check duplicate values in “Title” column.

In Step 9, we are calling the “Get-DuplicateListItems” function to find the duplicate values.

4

After the function executed successfully, we can see the following output.

In Step 10, we can see the output of this execution and can see 6 duplicate items found in the Title Column.

5

In Step 11, we can see the CSV file that is exported by the execution considering “Title” Column to be validated.

6

In Step 12, we see the output file and can notice duplicate values in “Title” Column.

7

In Step 13, we have changed the list of columns to be validated. In this second execution, I have added another column “Role”.

Now, the list will be validated for duplicity based on the combination of “Title & Role” columns.

8

In Step 14, we can see the output of this execution and see 4 duplicate items in “Title & Role” columns.

9

In Step 15, we see the CSV file that is exported by the execution considering “Title & Role” column to be validated.

10

In Step 16, we see the output file and can notice duplicate values in “Title & Role” Column.

11

In Step 17, we have changed the list of columns to be validated. In this second execution, I have added another column “Location”.

Now, the list will be validated for duplicity based on the combination of “Title & Role & Location” Columns.

12

In Step 18, we see the output of this execution and can see 2 items found which are duplicate in “Title & Role & Location” columns.

13

In Step 19, we see the CSV file that is exported by the execution considering “Title & Role & Location” column to be validated.

14

In Step 20, we see the output file and can notice duplicate values in “Title & Role & Location” column.

15

This is a very simple technique that can be used to fix one of the issues with SharePoint List data.

Hope, you find it helpful.

X

Build smarter apps with Machine Learning, Bots, Cognitive Services - Start free.

Start Learning Now