Sub Queries In SQL Server

Usama Shahid
6y
17.4k
0
5

Article

Subqueries are nothing but literally a technique of nesting one query within the other. If you make nested queries up to multiple levels, then it will be the worst case scenario in your project and will affect the performance. Subqueries are also termed as Nested queries.

There are many places where we can use the subqueries but the most common places are:

As a search condition in a WHERE clause (most common)
Like a Table in a FROM clause
As a search condition in a HAVING clause
As a column in the SELECT clause

Let’s see a simple example of Subquery.

SELECT InvoiceId, CustomerId, Total
FROM Invoice
WHERE Total > (
SELECT AVG(Total)
FROM Invoice
)
ORDER BY Total DESC

In subqueries, we can use any number of tables like we’re doing here. We’re using the same table twice. So, you might get confused about the difference between Joins and Subqueries.

Comparison of Joins and Subqueries

Joins let you include columns from all the tables in the query. And subquery only includes the columns for the table in the outer query.

SELECT InvoiceId, CustomerId, Total
FROM Invoice
WHERE Total > (
SELECT AVG(Total)
FROM Invoice
)
ORDER BY Total DESC

So the explanation of the above point is that when we’re working with joins, we can include any column of any table in the joins query. Whereas when we’re working with subqueries, we can just select the columns of the outer query, as we have here.

SELECT InvoiceId, CustomerId, Total
FROM Invoice
WHERE Total > ...

Joins are easier to use and their performance is faster, whereas, Subquery has this benefit over the join that we can pass the aggregate values to the outer query with the help of a subquery.

Let’s see an example.

SELECT InvoiceId
, InvoiceDate
, Total
FROM Invoice
JOIN Customer
ON Invoice.CustomerId = Customer.CustomerId
WHERE Country = 'USA'
ORDER BY InvoiceDate DESC

This is the join query. Now, let’s write the same query in a subquery.

SELECT InvoiceId
, InvoiceDate
, Total
FROM Invoice
WHERE CustomerId IN (
SELECT DISTINCT CustomerId
FROM Customer
WHERE Country = 'USA'
)
ORDER BY InvoiceDate DESC

Another example of Subquery is where we’re applying the subquery at WHERE clause.

SELECT CustomerId, FirstName, LastName
FROM Customer
WHERE NOT EXISTS (
SELECT * FROM Invoice
WHERE Invoice.CustomerId = Customer.CustomerId
)

And we can put the same query in a little bit more different way.

SELECT CustomerId, FirstName, LastName
FROM Customer
WHERE CustomerId NOT IN (
SELECT DISTINCT CustomerId
FROM Invoice
)

Co-related Queries

Co-related queries are a type of variation of subquery or nested query. Both (co-related queries and subqueries) have outer queries and inner queries. But the big difference is in co-related queries that our inner queries also referencing the outer queries table.

This is the most common interview question to find the 2^nd most highest Salary. [Co-Related Query]

SELECT Name, Salary
FROM Employees e1
WHERE 2 = (
SELECT COUNT(e2.Salary)
FROM Employees e2
WHERE e2.Salary >= e1.Salary
)

And if we write the same query in [Subquery].

SELECT MAX(Salary)
FROM Employees
WHERE Salary NOT IN (
SELECT MAX(Salary)
FROM Employees
)

Look here, in subquery, both inner query and outer query are independent of each other. There are no references.

Now, you might be wondering how this query works actually. So for demonstration, let's see the below code.

SELECT * FROM Employees

And we’ll get.

Now let’s take a look at this query.

SELECT Name, Salary
FROM Employees e1
WHERE 2 = (
SELECT COUNT(e2.Salary)
FROM Employees e2
WHERE e2.Salary >= e1.Salary
)

Look in the inner query we’re referencing the outer query table.

So in the first iteration, let’s say e1,.Salary is 5000 and the check is WHERE e2.Salary is greater or equal to e1.Salary. So we’ll find 7 matches. And the Count (e2.Salary) will become 7 which is not equal to 2. It means 5000 is at the 7^th position in our table.

Similarly, let’s suppose e1.Salary value is 6500 which is greater than and equal to 2 matches in the complete table. Count(e2.Salary) will become 2, it means it is the 2^nd highest Salary.

Now let’s try one more time, e1.Salary is 5500. And we want to find the 3^rd highest Salary. And obviously, it is the 3^rd highest Salary.

SELECT Name, Salary
FROM Employees e1
WHERE 3 = (
SELECT COUNT(e2.Salary)
FROM Employees e2
WHERE e2.Salary >= e1.Salary
)

Now, when the inner executes, it will find 4 matches of 5500 in the table which means Count is 4 containing (5500, 5500, 6500, 7000). And here is the problem, we need to count just different Salaries. So:

SELECT Name, Salary
FROM Employees e1
WHERE 3 = (
SELECT COUNT(DISTINCT e2.Salary)
FROM Employees e2
WHERE e2.Salary >= e1.Salary
)

And now, our problem is solved. There is even a replication of Salaries multiple times in the table.