Join vs Exists in SQL Server

As an example, let's assume the following two tables, which we'll call tableA and tableB.

id name id title
-- ---- -- ----
1 Kenny 1 Analyst
1 Rob 2 Sales
4 Molly 3 Manager
1 Greg
2 John

If we wanted to get everyone that's an Analyst, we could do the following:

SELECT *
FROM tableA
WHERE tableA.id IN (SELECT tableB.id FROM tableB WHERE title = 'Analyst');
-- Returns 3 records - Kenny, Rob, and Greg

For those not very familiar with SQL, this should be relatively easy to understand. We have written a subquery that will get the id for the Analyst title in tableB. Using IN, we can then grab all of the employees from tableA who have that title.

While IN statements are fairly intuitive, they're often less efficient than the same query written as a JOIN or EXISTS statement would be.

To produce the same results as above, we can do the following:

-- EXISTS
SELECT *
FROM tableA
WHERE EXISTS (SELECT 1 FROM tableB WHERE title = 'Analyst' AND tableA.id = tableB.id);

-- JOIN (INNER is the default when only JOIN is specified)
SELECT *
FROM tableA
JOIN tableB
ON tableA.id = tableB.id
WHERE tableB.title = 'Analyst';

In most cases, EXISTS or JOIN will be much more efficient (and faster) than an IN statement. Why?

When using an IN combined with a subquery, the database must process the entire subquery first, then process the overall query as a whole, matching up based on the relationship specified for the IN.

With an EXISTS or a JOIN, the database will return true/false while checking the relationship specified. Unless the table in the subquery is very small, EXISTS or JOIN will perform much better than IN.

Furthermore, writing the query as a JOIN gives us some additional flexibility to easily return all of the employees if we'd like, or to even check for employees who do not have a title (orphan records).

-- Return employees and display their title
SELECT *
FROM tableA
JOIN tableB
ON tableA.id = tableB.id;
-- 1 Kenny 1 Analyst
-- 1 Rob 1 Analyst
-- 1 Greg 1 Analyst
-- 2 John 2 Sales

-- Which employees do not have a title?
SELECT *
FROM tableA
LEFT JOIN tableB
ON tableA.id = tableB.id
WHERE tableB.id IS NULL;
-- 4 Molly NULL NULL

In the first query above, Molly falls out because she does not have a title. If we would have liked her to appear in the record set, we could simply change the JOIN to a LEFT JOIN and she would appear with NULL data from tableB.

If you have many IN statements littered throughout your code, you should compare the performance of these queries against an EXISTS or JOIN version of the same query - you'll likely see performance gains.

I hope this illustrated some of the subtle differences between INs, EXISTS, and JOINs. Questions and feedback in the comments are appreciated.