Different Ways To Find And Delete Duplicate Rows From A Table In SQL Server

In this article, we will learn how to find and delete duplicate values from a table in SQL Server.

Sometimes we may find duplicate values in a table, and we have to find out the duplicate values from that table. In this article, we will find duplicate values using "Group By and Having" clause, "Common Table Expressions (CTE)" and "Rank" function.

For this operation we will create a table named as “employee” and add some records to it.

CREATE TABLE [dbo].[employee](
       [empid] [int] IDENTITY(1,1) NOT NULL,
       [empname] [nvarchar](50) NULL,
       [empaddress] [nvarchar](50) NULL
) ON [PRIMARY]

INSERT INTO [dbo].[employee] VALUES('Amit Mohanty', 'Hyderabad')
INSERT INTO [dbo].[employee] VALUES('Swarup Pradhan', 'Bhubaneswar')
INSERT INTO [dbo].[employee] VALUES('Alok Pradhan', 'Bhubaneswar')
INSERT INTO [dbo].[employee] VALUES('Amanda Murphy', 'Bengaluru')
INSERT INTO [dbo].[employee] VALUES('Kartik Reddy', 'Hyderabad')
INSERT INTO [dbo].[employee] VALUES('Swarup Pradhan', 'Bhubaneswar')
INSERT INTO [dbo].[employee] VALUES('Amit Mohanty', 'Hyderabad')
INSERT INTO [dbo].[employee] VALUES('Amanda Murphy', 'Bengaluru')
INSERT INTO [dbo].[employee] VALUES('Swarup Pradhan', 'Bhubaneswar')
INSERT INTO [dbo].[employee] VALUES('Amit Mohanty', 'Hyderabad')
INSERT INTO [dbo].[employee] VALUES('Swarup Pradhan', 'Hyderabad')
INSERT INTO [dbo].[employee] VALUES('Malaya Kabiraj', 'Kolkata')

By Using Group By and Having clause

By using Group By clause we can group all data of a table as per our defined columns and count the duplicate occurrence of rows we can use COUNT function.

For example,

SELECT [empname], [empaddress], [duplicate] = COUNT(*)
FROM [dbo].[employee]
GROUP BY [empname], [empaddress]
HAVING COUNT(*) > 1;

In the above query, we will get all data having duplicated more than one, and the “[duplicate]” column shows how many duplicate records are there.

Now we need to keep one record and delete all duplicate records. For this, we can use the “MIN” function to get the first record of all duplicate records.

SELECT * FROM [dbo].[employee]
WHERE [empid] NOT IN (SELECT MIN([empid])
FROM [dbo].[Employee]
GROUP BY [empname], [empaddress]);

In the above query, we will exclude the minimum id of each duplicate row.

To delete the duplicate records we can use the below query.

DELETE FROM [dbo].[Employee]
WHERE [empid] NOT IN (SELECT MIN([empid])
FROM [dbo].[Employee]
GROUP BY [empname], [empaddress]);

By Using Common Table Expressions (CTE)

Here we use “Row_Number” function along with Common Table Expressions to find out the duplicate values. CTE is available SQL Server 2005 or higher version.

;WITH CTE([empname], [empaddress], [duplicate])
AS (SELECT [empname], [empaddress], ROW_NUMBER() OVER(PARTITION BY [empname], [empaddress] ORDER BY [empid]) AS [duplicate]
FROM [dbo].[employee])
SELECT * FROM CTE;

In the above query “[duplicate]” column gives unique row id for each row of the duplicate row.

If we want to delete duplicate record using CTE then we have to add a “Where” clause with condition “[duplicate] > 1”.

;WITH CTE([empname], [empaddress], [duplicate])
AS (SELECT [empname], [empaddress], ROW_NUMBER() OVER(PARTITION BY [empname], [empaddress] ORDER BY [empid]) AS [duplicate]
FROM [dbo].[employee])
DELETE FROM CTE WHERE [duplicate] > 1;

The above query deletes all duplicate records having “[duplicate]” greater than 1.

By Using RANK() Function

By using “Rank()” function also we can delete duplicate records. Here we will use Rank function along with “PARTITION BY” clause and “INNER JOIN”.

SELECT e.[empid], e.[empname], e.[empaddress], t.[rank]
FROM [dbo].[employee] e
INNER JOIN (SELECT *, RANK() OVER(PARTITION BY [empname], [empaddress] ORDER BY [empid]) AS [rank]
FROM [dbo].[Employee]) t ON e.[empid] = t.[empid];

In the above query “[rank]” column gives unique row id for each row of the duplicate row like the CTE query.

To delete duplicate records here also we have to add “Where” clause with condition “[rank] > 1”.

DELETE e FROM [dbo].[employee] e
INNER JOIN (SELECT *, RANK() OVER(PARTITION BY [empname], [empaddress] ORDER BY [empid]) AS [rank]
FROM [dbo].[Employee]) t ON e.[empid] = t.[empid] WHERE [rank] > 1;

In this article, we have learned how to find and delete duplicate values in a table by using “Group By and Having” clause, “Common Table Expressions (CTE)” and “Rank” function in SQL Server.

Hope this will help the readers. Happy Coding !!!