Remove Duplicate Reversed Pairs

Introduction

In this article, we will learn how to remove all reversed number pairs from given table, keep only one (random) in output table if something exists. This SQL problem is a often asked in interview questions, so this have been asked by multiple companies for the data analyst, data science and data engineering interviews. And in this input table for this simplicity we can consider there will be one entry for such kind thing like (1,2) and (2,1) can only be exist once there won't be any such entries like one entry for (1,2) and (2,1) repeated twice or thrice. This is our SQL problem in this article we are going to solve the statement

Using join operation and where clause

At first we need create a database (here in my case database name is example) then create a table (here in my case i have taken table name is details) now insert some dummy data by using below SQL Query

create database example;
use example
create table details(A int, B int)
insert into details values(1,2);
insert into details values(3,2);
insert into details values(2,4);
insert into details values(2,1);
insert into details values(5,6);
insert into details values(4,2);

now check the table data using below query

select*from details 

Output

  

So first try to solve it using the join operation. so here first let me explain how this is going to work so we can do this self join operation with a condition to change those pairs together (1,2) , (2,1) , (2,4) and (4,2).  We can apply where condition where we can eliminate anyone of it either (1, 2) or (2, 1) for the self a join operation. The on condition which is writing. That is little bit tricky, not very complex to understand and how we can actually change these two pairs together (1, 2) to (2,1) So let's say this is my table T1 and again we are using these self join here. So that table we can name it as a T2. So what we actually need to check in the on condition for the table T1 the column of value B should be equals to the column of value A in T2 table and the second condition part will be the column value of A from table T1 should be equal to the column value of B from the T2 table.

So this would be our join condition. So first let me try to write that join condition for you. Then it will be more clear to you. So here what we're trying to do, we will be doing a left join operation while left join again with this table number pairs and we will name it as a T2 and what would be my join condition. The join condition will be value of B column from T1 table should be equals to value of A from the T2 table. And the second part of the condition will be T1 of Should be equals to T2 of B or directly you can use the below code:

SELECT  *  FROM details t1
LEFT JOIN details t2
  ON t1.B = t2.A
  AND t1.A = t2.B;

Output

See the below snapshot. Blue highlighted data is coming from the table T1 and Red highlighted data is actually coming from the table T2.

 So you can see simply how this pair (1,2) got chained with (2,1)  based on just above mentioned code condition since (3,2) there was nothing as such available. That's why I use left join. Because we will be using these null related values in order to filter these pairs as well because we need it in our output. so this one is clear that how it got joined and when we will come to this pair (2,4) this one reversed pair was (4,2) and now (2,1) Obviously this condition will get satisfied because we're doing a self join, so this pair will find this one as a reversed pair and that's why it is being populated here for (5,6) there was number such reverse pair exist, so that's why these values are null and (4,2) we found (2,4) and now we need to print our output. An in our output, if you remember the values were like this among these two pairs (1,2) and (2,1) . We were picking this single value only and how we were actually picking single value, the value from the column A from table T1 should be less than the value of column A from the table T2. This kind of comparison we can put and this way we will be only able to pick this pair and we will be discarding the repeated values and same thing applies for (2,4) and (4,2)  and we need these records as well. So simply we can put a check for this column right where the T2.A is null. In that case we're simply picking single value. No need to check for any quality related thing. So this is our join part. We need to select the columns which we want in our final output so that we are going to select from T1 table itself T1.B once we are getting the join output then we are applying where clause, first where clause will be that if this value of column from T2 is null then straight forward we are going to pick these records. There is no further checking required or let's say this is populated. In that case we need to apply the logic for the value of column A from T1 table should be less than value of. A column from the T2 table. That's how we are going to pick any of one among these two like (1,2) to (2,1). Now let me print this one and here, we can see the output which we were looking for or directly you can use the below code:


SELECT
  t1.A,
  t1.B
FROM details t1
LEFT JOIN details t2
  ON t1.B = t2.A
  AND t1.A = t2.B
WHERE t2.A IS NULL
OR t1.A < t2.A;

Output

Conclusion 

Here, we have used join operation and where clause to solve the problem statement. It is useful for us to remove duplicate reverse pairs.

If you have any queries/suggestions on the article, please leave your questions and thoughts in the comment section below. Follow C# Corner to learn more new and amazing things about SQL Server or to explore more technologies.

Thanks for reading and I hope you like it.

 


Similar Articles