How to Remove Duplicate Records in SQL?

Abhishek Yadav

Dec 04

278

Solution:

WITH cte AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY StudentName, Subject, MarksORDER BY Id) AS rnFROM StudentExams
)
DELETE FROM cte WHERE rn > 1;

Explanation:

PARTITION BY groups similar rows.
ROW_NUMBER() assigns a sequence number to each row within the group.
Rows with rn > 1 are duplicates and are removed.

To remove duplicate records in SQL, you can use a query with functions like ROW_NUMBER() to identify duplicate rows based on specific columns, then delete the extra records while keeping one unique entry. This is commonly done using a Common Table Expression (CTE) that assigns a row number to each record within a group of duplicates, allowing you to safely remove the unwanted rows without losing valid data.

Lawrence Chapman

Dec 19

Removing duplicate records in SQL depends on how you define a duplicate and which database you’re using (SQL Server, MySQL, Oracle, PostgreSQL). Below are the most common and practical approaches, explained step-by-step.

1️⃣ Identify Duplicate Records (Best Practice – First Step)

Example Table

Employee(Id, Name, Email)

Find duplicates based on Email

SELECT Email, COUNT(*) AS Total
FROM Employee
GROUP BY Email
HAVING COUNT(*) > 1;

2️⃣ Remove Duplicates Using `ROW_NUMBER()` (Recommended)

✅ Works in SQL Server, MySQL 8+, PostgreSQL, Oracle

Keep the latest record (highest Id)

WITH CTE AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY Email ORDER BY Id DESC) AS rnFROM Employee
)
DELETE FROM CTE
WHERE rn > 1;

✔ Keeps one record per Email
✔ Deletes all extra duplicates

3️⃣ Remove Duplicates Without CTE (MySQL < 8)

DELETE e1
FROM Employee e1
JOIN Employee e2
ON e1.Email = e2.Email
AND e1.Id > e2.Id;

✔ Keeps record with lowest Id

4️⃣ Remove Duplicates Using `DISTINCT` (Create Clean Table)

⚠ Use when table has no primary key

CREATE TABLE Employee_New ASSELECT DISTINCT *FROM Employee;DROP TABLE Employee;ALTER TABLE Employee_New RENAME TO Employee;

5️⃣ Remove Duplicates Using `GROUP BY`

DELETE FROM Employee
WHERE Id NOT IN (SELECT MIN(Id)FROM EmployeeGROUP BY Email
);

✔ Keeps one row per Email
❌ Slower for large tables

6️⃣ Prevent Future Duplicates (Very Important 🔐)

Add UNIQUE Constraint

ALTER TABLE Employee
ADD CONSTRAINT UQ_Employee_Email UNIQUE (Email);

Now duplicates can’t be inserted again 🚀

7️⃣ SQL Server Specific (Using Temp Table)

SELECT *INTO #TempEmployee
FROM (SELECT *,ROW_NUMBER() OVER (PARTITION BY Email ORDER BY Id) rnFROM Employee
) t
WHERE rn = 1;TRUNCATE TABLE Employee;INSERT INTO Employee
SELECT * FROM #TempEmployee;

🧠 Which Method Should You Use?

Scenario	Best Method
Large table	`ROW_NUMBER()`
Old MySQL	Self JOIN
No primary key	`DISTINCT`
Prevent duplicates	`UNIQUE constraint`

Ajay Bansode

Jan 12

Insert Link

Embed YouTube Video

Table Options

Insert Image