Deleting Duplicate Records From a Table in MS SQL

Deletion of duplicate records from a table is tricky and can be done in various ways. One simple approach can be applied as described in this article using a Group By and sub-query.

This article gives a simple tutorial of deleting duplicate records from a table in Microsoft SQL Server. Here we will just see a simple way to get a count of duplicate records and delete them from the table.

This article is intended to cover the basic approach for the following questions:

  • How many unique records are there with their duplicate count?
  • How to delete all duplicate records by applying record uniqueness on other columns?

Deletion of duplicate records from a table is tricky and can be done in various ways. One simple approach can be applied as in the following using Group By and a sub-query. The usage syntax can be:

DELETE FROM myTableName
WHERE AutoGeneratedUniqueID NOT IN
(
       SELECT max( AutoGeneratedUniqueID )
       FROM myTableName
       GROUP BY col_1_considering_duplicate, col_2_considering_duplicate

[,other considering cols]

)

Setting the environment to understand it with an example

We will understand it by an example. We will create a table and fill it with some duplicate records. Considering an Automobile Garage application where somehow duplicate labour charges are coming for a particular vehicle model and labour. Automobile application's user/executive is in very big dilemma about how to charge his customers.

But the technical person guesses that the culprit is the table tblLabourCharge that stores the charges of labour by model and has some duplicate records. This table is intended to have one record for each labour of a particular model. Example: model "AA" with a labour code "5PQRS" should exist only once in the table. 

To help uunderstand, we will first create two additional tables tblModel and tblLabour.

Creating the Model table and inserting records into it as:

CREATE TABLE [dbo].[tblModel](
       [ModelID] [int] IDENTITY(1,1) NOT NULL primary key,
       [ModelCode] [nchar](2) NULL,
       [Description] [varchar](200) NULL,
       [CreatedBy] [varchar](10) NULL,
       [CreatedOn] [datetime] NULL,
)
Insert INTO tblModel
VALUES
 
('AA','Honda City','user9',getdate())
,(
'AB','Volvo ex','user9',getdate())
,(
'AC','Mahindra Suv 500','user9',getdate())
,(
'AD','Tata Indigo','user9',getdate())

Creating the Labour table and inserting records into it as:

CREATE TABLE [dbo].[tblLabour](

       [LabourID] [int] IDENTITY(1,1) NOT NULL primary key,

       [LabourCode] [nchar](5) NULL,

       [Description] [varchar](200) NULL,

       [CreatedBy] [varchar](10) NULL,

       [CreatedOn] [datetime] NULL,

)
Insert INTO tblLabour
VALUES
 ('5PQRS','Vehicle washing and cleaning','user9',getdate())
,(
'5ABCD','Stearing overhauling','user9',getdate())
,(
'5MLNR','Engine overhauling','user9',getdate())

Creating the Labour-Charges table and inserting records (with duplicates) into it as:

CREATE TABLE [dbo].[tblLabourCharges](
       [LabourChargeID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
       [ModelCode] [nchar](2) NOT NULL,
       [LabourCode] [nchar](5) NOT NULL,
       [LabourRatePerHour] [money] NULL,
       [RateEnforceDate] [datetime] NULL,
       [CreatedBy] [varchar](50) NULL,
       [CreatedOn] [datetime] NULL
)
Insert INTO tblLabourCharges
VALUES
 ( 'AA','5PQRS',100.00,getdate(),'user1', getdate())
,( 'AB','5ABCD',200.00,getdate(),'user1', getdate())
,( 'AA','5PQRS',100.00,getdate(),'user1', getdate())
,( 'AC','5ABCD',250.00,getdate(),'user1', getdate())
,( 'AB','5ABCD',100.00,getdate(),'user1', getdate())
,( 'AD','5MLNR',6500.00,getdate(),'user1', getdate())

 

So far we have inserted the required records into our tables tblModel, tblLabour, and tblLabourCharges. Also, we have inserted 2 duplicate records in table tblLabourCharges. Let's see the records:

SELECT * from tblModel

SELECT * from tblLabour

SELECT * from tblLabourCharges

removing duplicates select all.png
 

From above query results we can see there are two duplicate records in tblLabourCharges (on the basis of ModelCode + LabourCode only) i.e.

'AA' model with '5PQRS' labour code and

'AB' model with '5ABCD' labour code.

 

We can find the duplicate records count on the basis of (model code & labour code) using the following query:

SELECT ModelCode, LabourCode, count(LabourRatePerHour) AS DuplicateCount
FROM tblLabourCharges
GROUP BY ModelCode ,LabourCode 
ORDER BY ModelCode

duplicate record count.png
 

We have counted duplicate records and "DuplicateCount" which is greater than 1 is indicating a duplicity.

Now, we will finally fire the duplicate records deletion query using:

DELETE FROM tblLabourCharges
WHERE LabourChargeID NOT IN
(
SELECT max( LabourChargeID)
FROM tblLabourCharges
GROUP BY ModelCode, LabourCode
)

And we will see the result as in the following where there are no duplicates (on modelCode + LabourCode) remaining:

duplicate record deleted.png

So it was a very simple approach to delete the duplicate records from the table.