Post

Understanding Joins in SQL Server

As we know SQL Server has been created based on two mathematical concepts, they are Set Theory and Predicate Logic.

In set theory, the cartesian product is the basic operation. Joins in SQL Server also works in the same way as the Cartesian product.

In mathematics, the Cartesian Product of sets A and B is defined as the set of all ordered pairs (x, y) such that x belongs to A and y belongs to B.

For example, if A = {1, 2} and B = {3, 4, 5}, then the Cartesian Product of A and B is {(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)}.

When we apply joins between tables, the same cartesian product will happen first.

Joins are required to return the data from multiple tables. These tables should have common functionally similar columns to have a join condition between tables.

We will understand the joins after taking a look at cross-join (cartesian product) first.

CROSS Join

When we apply cross join between two tables(TableA and TableB), every row in TableA will have a combination with every row in TableB.

Let's take an example and look at it.

``````IF OBJECT_ID('dbo.ProductCategory') IS NOT NULL
DROP TABLE dbo.ProductCategory;
CREATE TABLE dbo.ProductCategory (
ProductCategoryId INT PRIMARY KEY IDENTITY,
CategoryName VARCHAR(500)
);
GO
INSERT INTO dbo.ProductCategory (CategoryName) VALUES ('Fruits');
INSERT INTO dbo.ProductCategory (CategoryName) VALUES ('Vegetables');
INSERT INTO dbo.ProductCategory (CategoryName) VALUES ('Water');
INSERT INTO dbo.ProductCategory (CategoryName) VALUES ('Dairy Based Food');
INSERT INTO dbo.ProductCategory (CategoryName) VALUES ('Meat');
GO
SELECT * FROM dbo.ProductCategory;
GO
IF OBJECT_ID('dbo.Product') IS NOT NULL
DROP TABLE dbo.Product;
CREATE TABLE dbo.Product (
ProductId INT PRIMARY KEY IDENTITY,
ProductName VARCHAR(500),
ProductCategoryId INT NOT NULL
);
GO
INSERT INTO dbo.Product (ProductName, ProductCategoryId) VALUES ('Apple', 1);
INSERT INTO dbo.Product (ProductName, ProductCategoryId) VALUES ('Mango', 1);
INSERT INTO dbo.Product (ProductName, ProductCategoryId) VALUES ('Capsicum', 2);
INSERT INTO dbo.Product (ProductName, ProductCategoryId) VALUES ('Tomato', 2);
INSERT INTO dbo.Product (ProductName, ProductCategoryId) VALUES ('Milk', 4);
INSERT INTO dbo.Product (ProductName, ProductCategoryId) VALUES ('Curd', 4);
INSERT INTO dbo.Product (ProductName, ProductCategoryId) VALUES ('Chicken', 5);
INSERT INTO dbo.Product (ProductName, ProductCategoryId) VALUES ('Mutton', 5);
INSERT INTO dbo.Product (ProductName, ProductCategoryId) VALUES ('Pasta', 50);
INSERT INTO dbo.Product (ProductName, ProductCategoryId) VALUES ('Brown Rice', 65);
GO
SELECT * FROM dbo.Product;
GO
SELECT * FROM dbo.ProductCategory
CROSS JOIN dbo.Product;
GO
``````

In general, we refer to the left table as the ProductCategory table and the right table as the Product table. Every row from the ProductCategory table will combine each row with each row from the product table.

Below is the one-part result of the cross-join. Please run the above script to check the full resultset.

Inner Join

when we apply INNER JOIN between tables, the result is going to be only the rows which are satisfying the given condition. Non-matching rows will be ignored.

``````SELECT
PC.ProductCategoryId,
PC.CategoryName,
P.ProductId,
P.ProductName
FROM dbo.ProductCategory PC
INNER JOIN dbo.Product P ON PC.ProductCategoryId = P.ProductCategoryId;
``````

LEFT OUTER Join/ LEFT Join

when we apply LEFT JOIN between tables, the result is going to be only the rows which are satisfying the given condition plus Non-matching rows from left side table, null values will be returned for the corresponding rows from the right side table.

``````SELECT
PC.ProductCategoryId,
PC.CategoryName,
P.ProductId,
P.ProductName
FROM dbo.ProductCategory PC
LEFT JOIN dbo.Product P ON PC.ProductCategoryId = P.ProductCategoryId;
``````

RIGHT OUTER Join / RIGHT Join

when we apply RIGHT JOIN between tables, the result is going to be only the rows which are satisfying the given condition plus Non-matching rows from the RIGHT side table, null values will be returned for the corresponding rows from the LEFT side table.

``````SELECT
PC.ProductCategoryId,
PC.CategoryName,
P.ProductId,
P.ProductName
FROM dbo.ProductCategory PC
RIGHT JOIN dbo.Product P ON PC.ProductCategoryId = P.ProductCategoryId;
``````

FULL OUTER Join / FULL Join

when we apply FULL JOIN between tables, the result is going to be only the rows which are satisfying the given condition plus Non-matching rows from left side table, plus non-matching rows from right table, null values will be returned for the corresponding rows from the both side tables.

``````SELECT
PC.ProductCategoryId,
PC.CategoryName,
P.ProductId,
P.ProductName
FROM dbo.ProductCategory PC
FULL JOIN dbo.Product P ON PC.ProductCategoryId = P.ProductCategoryId;
``````

SELF JOIN in SQL Server

When we join a table with itself, it is called a "self join". It is essentially an inner join where both the left and right tables are the same table. We use a self join when a logical relationship exists between two columns within the same table.

Example

List out all the employees along with their managers.

``````SELECT
E.name AS EmployeeName,
M.Name AS ManagerName
FROM
Employee E
INNER JOIN
Employee M ON E.ManagerId = M.EmployeeId;
``````

In the same way, we can join more than two tables as well. The only thing we need to find out is the common table column between tables and the columns to be returned.

There is another important difference to understand between the join condition and the where condition.

what is the above difference/how does the above difference behave in the case of inner join and left join?

Two important points here.

1. Matching Predicate
2. Filtering Predicate

Let's use the below query to understand the above two points.

``````SELECT
PC.ProductCategoryId,
PC.CategoryName,
P.ProductId,
P.ProductName
FROM
dbo.ProductCategory PC
INNER JOIN
dbo.Product P ON PC.ProductCategoryId = P.ProductCategoryId
WHERE
PC.CategoryName = 'Meat';
``````

When we write any condition in the ON clause it becomes a Matching Predicate. The condition we write in where clause is Filtering Predicate.

Let's modify the above query and try to understand what will happen.

``````SELECT
PC.ProductCategoryId,
PC.CategoryName,
P.ProductId,
P.ProductName
FROM
dbo.ProductCategory PC
INNER JOIN
dbo.Product P ON PC.ProductCategoryId = P.ProductCategoryId AND PC.CategoryName = 'Meat';
``````

In the case of inner join, it does not make any difference.

``````SELECT
PC.ProductCategoryId,
PC.CategoryName,
P.ProductId,
P.ProductName
FROM
dbo.ProductCategory PC
INNER JOIN
dbo.Product P ON PC.ProductCategoryId = P.ProductCategoryId
WHERE
PC.CategoryName = 'Meat';
``````

Now move the filter condition from where clause to on as below.

``````SELECT
PC.ProductCategoryId,
PC.CategoryName,
P.ProductId,
P.ProductName
FROM
dbo.ProductCategory PC
INNER JOIN
dbo.Product P ON PC.ProductCategoryId = P.ProductCategoryId AND PC.CategoryName = 'Meat';
``````

Now we understand that there is no difference in the case of writing conditions in where clause or on clause when we use inner join.

But in the case of left join, there is an important difference.

``````SELECT
PC.ProductCategoryId,
PC.CategoryName,
P.ProductId,
P.ProductName
FROM
dbo.ProductCategory PC
LEFT JOIN
dbo.Product P ON PC.ProductCategoryId = P.ProductCategoryId
WHERE
PC.CategoryName = 'Meat';
``````

Now move the condition from where clause to on clause.

See the surprising result.

So, please note that the condition we write in on clause is matching predicate, it won't filter the data, if there is no match from the right side table, it will give a null value. You have to write your filter conditions always in the where clause to filter the data.

Joins can be used to update statements and delete statements as well.

When you write update/delete statements based on left join be careful. Please test your query to understand whether it is giving the correct result or not.

Next Recommended Reading Self Joins in SQL Server