SQL - Interesting Queries

SQL, the Structural Query Language, seems simple, but often with some tricks in them. This article will discuss some interesting, and maybe complex queries, and list the regular ones as references at the bottom.
 

Set up Table and initial Data

 
Set up Test Database Environment
  1. USE [TestDB]  
  2. GO  
  3. /****** Object: Table [dbo].[Employees] Script Date: 5/28/2021 7:42:51 PM ******/  
  4. SET ANSI_NULLS ON  
  5. GO  
  6. SET QUOTED_IDENTIFIER ON  
  7. GO  
  8. CREATE TABLE [dbo].[Employees](  
  9. [Id] [int] IDENTITY(1,1) NOT NULL,  
  10. [Name] [nvarchar](maxNOT NULL,  
  11. [Salary] [intNOT NULL,  
  12. [ManagerId] [intNOT NULL,  
  13. CONSTRAINT [PK_dbo.Employees] PRIMARY KEY CLUSTERED  
  14. (  
  15.    [Id] ASC  
  16. )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFFON [PRIMARY]  
  17. ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]  
  18. GO  
Insert initial Data
  1. insert dbo.Employees  
  2. Values ('Greg', 100000, 1),  
  3.        ('George', 150000, 1),  
  4.        ('Helen', 130000, 1),  
  5.        ('Tom', 120000, 2),  
  6.        ('Kevin', 110000, 2),  
  7.        ('David', 120000, 3),  
  8.        ('Geek', 110000,3),  
  9.        ('Tesla', 120000,3),
  10.        ('David', 120000,3),
  11.        ('Kevin', 110000,2)    
The result will be,
 
 

Questions

 
Question 1 - Query to find Second Highest Salary of Employee?
 
This is a very popular SQL question, one can find out it on different sites with multiple solutions.
 
Answer 1
  1. SELECT MAX(salary) [Second Highest Salary]  
  2. FROM employees  
  3. WHERE salary NOT IN (SELECT MAX(salary) FROM employees)  
Result
 
 
This solution is most effiecient and is the best solution. It is short and easy to understand.
 
 
We use the SUBQUERY in the WHERE Clause. The solution is straight forward,
  • SubQuery (Inner Query in the graph) SELECT the MAX salary from the employees table.
  • Main Query (Outer Query in the graph) SELECT the MAX salary from the employees table without the MAX salary.
In this article, How to find Second Highest salary in SQL?, it gives more than 10 ways for this question, but we are not interested in getting the same solution with more complex logic.
 
There ARE some equivalent solutions to what we have in Answer 1, and some with advantages for the extended question such as find Third (Nth) Highest Salary of Employee?
 
Answer 1-2,  
  1. SELECT MAX(salary) [Second Highest Salary]  
  2. FROM employees  
  3. WHERE salary < (SELECT MAX(salary) FROM employees)  
This is exactly equivalent to Answer 1.
 
Answer 1-3,
  1. SELECT MAX(salary) [Second Highest Salary]  
  2. FROM employees  
  3. WHERE salary NOT IN (SELECT DISTINCT TOP 1 salary   
  4.                      FROM employees  
  5.                      ORDER BY salary DESC)  
The logic of this solution is similar to Answer 1, but SubQuery uses SELECT TOP Clause.  The advantage for this is that if we replace 1 in SubQuery by 2, we can get the solution for finding the Third Highest Salary of Employee, such as
  1. SELECT MAX(salary)  
  2. FROM employees  
  3. WHERE salary NOT IN (SELECT DISTINCT TOP 2 salary   
  4.                      FROM employees  
  5.                      ORDER BY salary DESC)  
For which, if we try to use Answer 1 to get the solution, it is possible, but a bit complex,
  1. SELECT MAX(salary) [Second Highest Salary]  
  2. FROM employees  
  3. WHERE salary NOT IN (  
  4.                     SELECT MAX(salary)   
  5.                     FROM employees  
  6.                     )  
  7. AND salary NOT IN (  
  8.                     SELECT MAX(salary)   
  9.                     FROM employees  
  10.                     WHERE salary NOT IN (  
  11.                                         SELECT MAX(salary)   
  12.                                         FROM employees  
  13.                                         )  
  14.                     )  
where,
  • NOT IN the first SubQuery --- get rid of the top max value as Answer 1 did, and 
  • the second SubQuery is just the Answer 1 itself 
  • NOT IN the second SubQuery --- get rid of the second max value
Therefore, we got the third-highest salary, however, with quite complex logic --- this is what we do not like.
 
Furthermore, if we want the Nth Highest Salary of Employee, it will be hard by Answer 1 or Answer 1-2, but we can easy to get it by Answer 1-3 by replacing 1 by N-1, then we have,
  1. SELECT MAX(salary) [Nth Highest Salary]  
  2. FROM employees  
  3. WHERE salary NOT IN (SELECT DISTINCT TOP (N-1) salary   
  4.                      FROM employees  
  5.                      ORDER BY salary DESC)  
Answer 1-4,
  1. SELECT DISTINCT salary [Second Highest Salary]   
  2. FROM employees e1   
  3. WHERE 2 = (SELECT COUNT(DISTINCT salary) FROM employees e2   
  4. WHERE e1.salary <= e2.salary);  
This is equivalent to Answer 1-3, and also can be used to get Nth Highest Salary of Employee. However, it is a bit hard to understand its logic. Let us discuss it step by step,
 
First, the Inner Query,
  1. SELECT COUNT(DISTINCT salary) FROM employees  
Return - 5, that is the category of salary:
  1. SELECT salary FROM employees  
  2. GROUP BY salary  
Return
 
 
Then we add the condition into the Sub Query, and let it == 2,
  1. 2 = (SELECT COUNT(DISTINCT salary) FROM employees e2     
  2. WHERE e1.salary <= e2.salary)  
Satisfy the condition for e1.salary from the Major (Outer) Query must be the 130000, the second-highest salary as required.
 
Question 2 - Find and Remove Duplicate Rows from a Database Table?
 
Finding Duplicate Rows is relatively easy to do while locating the positions for the Duplicate Rows, and therefore Removing them is a bit difficult.
 
Finding Duplicate Rows
  1. SELECT name, salary, managerid, COUNT(salary) Records 
  2. FROM employees  
  3. GROUP BY name, salary, managerid  
  4. HAVING COUNT(salary) > 1  
The result,
 
 
We group all records except the identity or primary key that must be different for a different records. 
 
Removing Duplicate Rows with an identity[ref]
 
We first need to locate the position of the duplicate records, which is easier when the table has an identity on it.
 
Use the following query to select the duplicate records,
  1. SELECT *   
  2. FROM employees   
  3. WHERE Id NOT IN (SELECT MIN(Id)   
  4. FROM employees  
  5. GROUP BY name, salary, ManagerId)  
Those are,
 
 
Then Delete them,
  1. DELETE  
  2. FROM employees   
  3. WHERE Id NOT IN (SELECT MIN(Id)   
  4. FROM employees  
  5. GROUP BY name, salary, ManagerId)  
Result
 
Records 9, 10 are deleted,
 
 
Removing Duplicate Rows withOUT an identity --- Oracle[ref]
 
For the Oracle database, it is easier because Oracle has a ROWID pseudo column for a table, then we can do the job as.
 
Locate the duplicate Records,
  1. SELECT *  
  2. FROM employees  
  3. WHERE ROWID NOT IN (SELECT MIN(ROWID)  
  4. FROM employees  
  5. GROUP BY name, salary, ManagerId)  
Delete the duplicate Records,
  1. DELETE  
  2. FROM employees  
  3. WHERE ROWID NOT IN (SELECT MIN(ROWID)  
  4. FROM employees  
  5. GROUP BY name, salary, ManagerId)  
Removing Duplicate Rows withOUT an identity --- SQL Server
 
We can use CTE (Common Table Expression) with ROW_NUMBER[reffunction.
 
Locate the duplicate Records[ref],
  1. ;WITH CTE AS  
  2. (  
  3.    SELECT name, salary,managerid,  
  4.    ROW_NUMBER() OVER (PARTITION BY name, salary, managerid 
  5.    Order by name) AS Rnum  
  6.    FROM employees  
  7. )  
  8. SELECT * From CTE  
Result
 
 
Delete the duplicate Records: Rnum > 1,
  1. ;WITH CTE AS    
  2. (    
  3.    SELECT name, salary,managerid,    
  4.    ROW_NUMBER() OVER (PARTITION BY name, salary, managerid   
  5.    Order by nameAS Rnum    
  6.    FROM employees    
  7. )    
  8. --SELECT * From CTE    
  9. DELETE FROM CTE WHERE Rnum <> 1    
We can also do the job without using CTE[ref],
  1. DELETE DUP    
  2. FROM    
  3. (    
  4.     SELECT ROW_NUMBER() OVER (PARTITION BY name, salary, managerid   
  5.     Order by nameAS Rnum    
  6.     FROM employees    
  7. ) DUP    
  8. WHERE DUP.Rnum > 1;    

Summary

 
We choose two queries at this first time trying,
  • Query to find Second Highest Salary of Employee? 
  • Find and Remove Duplicate Rows from a Database Table?
If I find some other interesting Query, I will add them to this article.

References

Concept