2
Reply

How to Remove Duplicate Records in SQL?

Abhishek Yadav

Abhishek Yadav

Dec 04
278
1
Reply

Solution:

WITH cte AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY StudentName, Subject, MarksORDER BY Id) AS rnFROM StudentExams
)
DELETE FROM cte WHERE rn > 1;

Explanation:

  • PARTITION BY groups similar rows.

  • ROW_NUMBER() assigns a sequence number to each row within the group.

  • Rows with rn > 1 are duplicates and are removed.

    To remove duplicate records in SQL, you can use a query with functions like ROW_NUMBER() to identify duplicate rows based on specific columns, then delete the extra records while keeping one unique entry. This is commonly done using a Common Table Expression (CTE) that assigns a row number to each record within a group of duplicates, allowing you to safely remove the unwanted rows without losing valid data.

    Removing duplicate records in SQL depends on how you define a duplicate and which database you’re using (SQL Server, MySQL, Oracle, PostgreSQL). Below are the most common and practical approaches, explained step-by-step.


    1️⃣ Identify Duplicate Records (Best Practice – First Step)

    Example Table

    Employee(Id, Name, Email)
    

    Find duplicates based on Email

    SELECT Email, COUNT(*) AS Total
    FROM Employee
    GROUP BY Email
    HAVING COUNT(*) > 1;

    2️⃣ Remove Duplicates Using ROW_NUMBER() (Recommended)

    ✅ Works in SQL Server, MySQL 8+, PostgreSQL, Oracle

    Keep the latest record (highest Id)

    WITH CTE AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY Email ORDER BY Id DESC) AS rnFROM Employee
    )
    DELETE FROM CTE
    WHERE rn > 1;
    

    ✔ Keeps one record per Email
    ✔ Deletes all extra duplicates

    3️⃣ Remove Duplicates Without CTE (MySQL < 8)

    DELETE e1
    FROM Employee e1
    JOIN Employee e2
    ON e1.Email = e2.Email
    AND e1.Id > e2.Id;
    

    ✔ Keeps record with lowest Id


    4️⃣ Remove Duplicates Using DISTINCT (Create Clean Table)

    ⚠ Use when table has no primary key

    CREATE TABLE Employee_New ASSELECT DISTINCT *FROM Employee;DROP TABLE Employee;ALTER TABLE Employee_New RENAME TO Employee;
    

    5️⃣ Remove Duplicates Using GROUP BY

    DELETE FROM Employee
    WHERE Id NOT IN (SELECT MIN(Id)FROM EmployeeGROUP BY Email
    );
    

    ✔ Keeps one row per Email
    ❌ Slower for large tables


    6️⃣ Prevent Future Duplicates (Very Important 🔐)

    Add UNIQUE Constraint

    ALTER TABLE Employee
    ADD CONSTRAINT UQ_Employee_Email UNIQUE (Email);
    

    Now duplicates can’t be inserted again 🚀


    7️⃣ SQL Server Specific (Using Temp Table)

    SELECT *INTO #TempEmployee
    FROM (SELECT *,ROW_NUMBER() OVER (PARTITION BY Email ORDER BY Id) rnFROM Employee
    ) t
    WHERE rn = 1;TRUNCATE TABLE Employee;INSERT INTO Employee
    SELECT * FROM #TempEmployee;
    

    🧠 Which Method Should You Use?

    ScenarioBest Method
    Large tableROW_NUMBER()
    Old MySQLSelf JOIN
    No primary keyDISTINCT
    Prevent duplicatesUNIQUE constraint