ARTICLE

Remove Duplicate Records in SQL Server 2005

Posted by Srividhya Sridhar Articles | SQL Server 2012 July 01, 2010
In this article you will learn how to Remove duplicate records in SQL Server 2005.
Reader Level:

SQL Server tables should never contain duplicate rows, if there is a key constraints. Table with no key constraints allow the duplicates. It happens in many ways such as import data from other resources.

Nothing to worry, write a query and make your database without duplicates.

First we need a table and some data to explain the process. Use the following script to create a table and insert a record.

CREATE TABLE Dup_Students (ID INT, FirstName varchar(25), Department Char(2))

CREATE TABLE Dup_Students (ID INT, FirstName varchar(25), Department Char(2))
 
INSERT INTO Dup_Students VALUES(1, 'Jack', 'IT')
INSERT INTO Dup_Students VALUES(2, 'Alice', 'ME')
INSERT INTO Dup_Students VALUES(3, 'James', 'EE')
INSERT INTO Dup_Students VALUES(4, 'Nickle', 'CE')
INSERT INTO Dup_Students VALUES(5, 'George', 'IT')
--
INSERT INTO Dup_Students VALUES(1, 'Jack', 'IT')
INSERT INTO Dup_Students VALUES(2, 'Alice', 'ME')
INSERT INTO Dup_Students VALUES(3, 'James', 'EE')
INSERT INTO Dup_Students VALUES(4, 'Nickle', 'CE')

--
INSERT INTO Dup_Students VALUES(1, 'Jack', 'IT')
INSERT INTO Dup_Students VALUES(2, 'Alice', 'ME')

Now you can view the number of duplicate records in each row.

SELECT ID, FirstName, Department, Count(*) as DuplicateCount
From Dup_Students
group by ID,FirstName, Department

Before deleting you can view the records with row number for the duplicates. This result set is going to act as a temp table for the delete process.

SELECT ID, FirstName, Department,
ROW_NUMBER() OVER(PARTITION BY ID, FirstName ORDER BY ID) AS DuplicateRowCount
FROM Dup_Students

We are going to delete all the duplicates using CTE (Common table expression) and ROW_NUMBER(), which is a new in SQL server 2005.

WITH Dup_Students_CTE (ID, FirstName, Department, DuplicateRowCount)
AS
(
SELECT ID, FirstName, Department,
ROW_NUMBER() OVER(PARTITION BY ID, FirstName ORDER BY ID) AS DuplicateRowCount
FROM Dup_Students
)
DELETE FROM Dup_Students_CTE
WHERE DuplicateRowCount > 1
GO

Login to add your contents and source code to this article
post comment
     

I dont know y to use "partition" .m not much aware about the sql server fundamentals.

Posted by Shilpa kamthan Aug 18, 2010

Hi giammin,
In this case all the columns are identical including ID. So this is a simple way to delete the duplicate records. Go through once again.

regards,
SrividhyaSridhar

Posted by Srividhya Sridhar Jul 01, 2010

you are taking the long, dangerous way:


that is easiest:

DELETE FROM Dup_Students WHERE ID NOT IN
(SELECT min(ID) FROM Dup_Students GROUP BY FirstName, Department )

Posted by giammin Jul 01, 2010
COMMENT USING
PREMIUM SPONSORS
Over-C is a holistic consortium of communications and technology specialists. We build, deploy and market both business as well as consumer products and solutions.
Join a Chapter
SPONSORED BY
  • PDF reports have never been easier to create. With our included WYSIWYG Designer, you can layout your reports, set up your data source and let DynamicPDF ReportWriter do the rest.
Join a Chapter