LINQ Join Operations Explained

Introduction

 
I've seen many SQL questions on how to perform JOIN operations with sequenced objects in applications using LINQ. That's why I've decided to produce some easy to understand samples of JOIN operations with LINQ technology which work well from LINQ to SQL.
 
In general, there are five types of joins,
  • Cross Join
    Every element of the first sequence is projected with every element of another sequence. Simply say all to all.

  • Inner Join
    Access data from two sequences that share a common value. Usually used for extending one sequence values to another, or for getting children records by setting the first sequence's key as a parent record.

    For example, the extension of the results from Products database table, with the product category name, when Products table keeps only category ID value and name is located in the Categories table.

    Or getting all menu subitems by menu parent item.

  • Left Join
    Processes all elements of the first sequence and finds out if it shares common keys with another sequence. For example, see all of your products in the database and find out if they have ever been sold or not.

  • Right Join 
    This is basically the same as Left Join, the only difference is that it will produce all results from the second sequence and finds common keys with the first sequence.

  • Full Join
    This can be considered as a combination of Left Join + Right Join. Processes all elements of two sequences through each other and finds out if they share common values.

    For common key results, values will be populated from both sequences. For independent records from each sequence, an additional row will be created and populated with the null result for the field of the second sequence.

    For Example, if we have employes and job positions Full Join between them will look like:

    Linq Join Operations

Cross Join

  1. var set1 = Enumerable.Range(1, 3);  
  2. var set2 = Enumerable.Range(4, 3);  
  3.   
  4. // Lambda Version
  5. var q = set1.Join(set2, s1 => 0, s2 => 0, (s1, s2) => new  
  6. {  
  7.   val1 = s1,  
  8.   val2 = s2  
  9. });  
  10.       
  11. // Query Version   
  12. var q = from s1 in set1  
  13.     from s2 in set2  
  14.     select new  
  15.     {  
  16.       val1 = s1,  
  17.       val2 = s2  
  18.     };  
As you can see, I took two sets. One is a collection of numbers from 1 to 3, and the second is from 4 to 6. The result is every possible combination of numbers.

Of course we can attach one more, two more, or even N more sets of numbers from each condition. The result will add corresponding columns and additional combinations, all possible combinations of numbers in sets.

This rule applies not only to simple types, but there can also be lists of custom objects and nothing will be changed from the logical point of view.

Now for some explanations about lambda for people who are not quite experienced in LINQ. Join functions are an extension method for all types of objects that are inherited from IEnumerable. In Set 1, (below), Join says that the set must be joined to another set, (set 2), which is given as a first parameter to the function. The second and third parameters define the Join Keys from set1 and set2 that must be compared in operation. While we are not looking for any similarities in sets, I used 0 as key for both sets which will always be the same.
 
1
4
1
5
1
6
2
4
2
5
2
6
3
4
3
5
3
6
 

Inner Join


A usual one to one join,
  1. var q = from p in Products  
  2.     join c in Categories  
  3.     on p.CategoryID equals c.CategoryID  
  4.     select new  
  5.     {  
  6.       ProductID = p.ProducID,  
  7.       ProductName = p.ProductName,  
  8.       CategoryID = c.CategoryID,  
  9.       CategoryName = c.CategoryName  
  10.     };  
  11.   
  12. var q = Products.Join(Categories, p => p.CategoryID, c => c.CategoryID, (p, c) => new  
  13. {  
  14.   ProductID = p.ProducID,  
  15.   ProductName = p.ProductName,  
  16.   CategoryID = c.CategoryID,  
  17.   CategoryName = c.CategoryName  
  18. });  
Both query and lambda versions of inner join operation are provided, according to the Product - Category example that I've described before.
 

Composite Key Join


There can be case scenarios when your join key can be more than one field. In that case, you need to use anonymous types and your join condition will look like:
  1. on new { t1.key1, t1.key2 } equals new { t2.key1, t2.key2 }  
Join Using a Subquery
  1. var q = from item in MenuItems  
  2.     where item.ParentID == null  
  3.     select new  
  4.     {  
  5.       ID = item.ID,  
  6.       Name = item.Name,  
  7.       Children = from subitem in MenuItems  
  8.             where subitem.ParentID == item.ID  
  9.             select new  
  10.             {  
  11.               ID = subitem.ID,  
  12.               Name = subitem.Name  
  13.             }  
  14.     };  
  15.   
  16. var q = MenuItems.Where(item => item.ParentID == null)  
  17.           .Select(item => new  
  18.           {  
  19.             ID = item.ID,  
  20.             Name = item.Name,  
  21.             Children = MenuItems.Where(subitem => subitem.ParentID == item.ID)  
  22.                       .Select(subitem => new  
  23.                       {  
  24.                         ID = subitem.ID,  
  25.                         Name = subitem.Name  
  26.                       })  
  27.           });  
One to many, parent - child implementation, example that I've described above about menu items. Parent items are geting their children using subquery.

Group Join


Group join is used for one to many relations, where you'll get the result of parent records and collection of children for each parent. For testing purposes, I took an AdventureWorks database Customer and Orders tables.
  1. DataClassesDataContext db = new DataClassesDataContext();  
  2.   
  3. var q = from c in db.Customers  
  4.     join o in db.Orders on c.CustomerID equals o.CustomerID  
  5.     into CustomerOrders  
  6.     select new  
  7.     {  
  8.       CustomerID = c.CustomerID,  
  9.       FirstName = c.FirstName,  
  10.       LastName = c.LastName,  
  11.       Orders = from co in CustomerOrders  
  12.             select new  
  13.             {  
  14.               OrderID = co.OrderID,  
  15.               OrderDate = co.OrderDate,  
  16.               ShipDate = co.ShipDate,  
  17.               SubTotal = co.SubTotal,  
  18.               Tax = co.TaxAmt  
  19.             }  
  20.     };  
  21.   
  22. var q = db.Customers.GroupJoin(db.Orders, c => c.CustomerID, o => o.CustomerID, (c, CustomerOrders) => new  
  23. {  
  24.   CustomerID = c.CustomerID,  
  25.   FirstName = c.FirstName,  
  26.   LastName = c.LastName,  
  27.   Orders = CustomerOrders.Select(co => new  
  28.   {  
  29.     OrderID = co.OrderID,  
  30.     OrderDate = co.OrderDate,  
  31.     ShipDate = co.ShipDate,  
  32.     SubTotal = co.SubTotal,  
  33.     Tax = co.TaxAmt  
  34.   })  
  35. });  
As you can see, non of the queries are using references of the Orders table, because in query it doesn't exist. While we are using GroupJoin, CustomerOrders gathers all order results for each customer and exposes it in the query.

Left Join

  1. DataClassesDataContext db = new DataClassesDataContext();  
  2.   
  3. var q = from c in db.Customers  
  4.     join o in db.Orders on c.CustomerID equals o.CustomerID  
  5.     into CustomerOrders  
  6.     from o in CustomerOrders.DefaultIfEmpty()  
  7.     select new  
  8.     {  
  9.       CustomerID = c.CustomerID,  
  10.       FirstName = c.FirstName,  
  11.       LastName = c.LastName,  
  12.       OrderID = (o == null) ? null : (int?)o.OrderID,  
  13.       OrderDate = (o == null) ? null : (DateTime?)o.OrderDate,  
  14.       ShipTo = o.ShipTo  
  15.     };  
  16.   
  17. var q = db.Customers.GroupJoin(db.Orders, c => c.CustomerID, o => o.CustomerID, (c, o) => new  
  18. {  
  19.   customer = c,  
  20.   order = o  
  21. })  
  22. .SelectMany(c => c.order.DefaultIfEmpty(), (c, o) => new  
  23. {  
  24.   CustomerID = c.customer.CustomerID,  
  25.   FirstName = c.customer.FirstName,  
  26.   LastName = c.customer.LastName,  
  27.   OrderID = (o == null) ? null : (int?)o.OrderID,  
  28.   OrderDate = (o == null) ? null : (DateTime?)o.OrderDate,  
  29.   ShipTo = o.ShipTo  
  30. });  
Pay attention at null check for orders. The reason of this implementetion is because as the OrderID and OrderDate columns are non nullable in database, LINQ also tranfers them as not nullable fields, but during the left join reference of Order's object, it becomes null where there are no orders for the customer.

Right Join

  1. DataClassesDataContext db = new DataClassesDataContext();  
  2.   
  3. var q = from o in db.Orders  
  4.     join c in db.Customers on o.CustomerID equals c.CustomerID  
  5.     into OrderCustomers  
  6.     from c in OrderCustomers.DefaultIfEmpty()  
  7.     select new  
  8.     {  
  9.       CustomerID = c == null ? null : (int?)c.CustomerID,  
  10.       FirstName = c.FirstName,  
  11.       LastName = c.LastName,  
  12.       OrderID = o.OrderID,  
  13.       OrderDate = o.OrderDate,  
  14.       ShipTo = o.ShipTo  
  15.     };  
  16.   
  17. var q1 = db.Orders.GroupJoin(db.Customers, o => o.CustomerID, c => c.CustomerID, (o, c) => new  
  18. {  
  19.   customer = c,  
  20.   order = o  
  21. })  
  22. .SelectMany(o => o.customer.DefaultIfEmpty(), (o, c) => new  
  23. {  
  24.   CustomerID = c == null ? null : (int?)c.CustomerID,  
  25.   FirstName = c.FirstName,  
  26.   LastName = c.LastName,  
  27.   OrderID = o.order.OrderID,  
  28.   OrderDate = o.order.OrderDate,  
  29.   ShipTo = o.order.ShipTo  
  30. });  
I think Right Join doesn't require any explanation because it pretty much the same as Left Join, only the places for tables are wrapped.

Full Join


As a Full Join is a union of Left and Right Joins, if we convert this concept to code, the result will be obvious,
  1. DataClassesDataContext db = new DataClassesDataContext();  
  2.   
  3. var left = from e in db.Employers  
  4.       join d in db.Departments on e.DepartmentID equals d.DepartmentID  
  5.       into EmployerDepartments  
  6.       from d in EmployerDepartments.DefaultIfEmpty()  
  7.       select new  
  8.       {  
  9.         EmployerID = (int?)e.EmployerID,  
  10.         Fname = e.Fname,  
  11.         Lname = e.Lname,  
  12.         EmployerDepartmentID = (int?)e.DepartmentID,  
  13.         Department = d.Caption,  
  14.         DepartmentID = d == null ? null : (int?)d.DepartmentID  
  15.       };  
  16.   
  17. var right = from d in db.Departments  
  18.       join e in db.Employers  
  19.       on d.DepartmentID equals e.DepartmentID  
  20.       into DepartmentEmployers  
  21.       from e in DepartmentEmployers.DefaultIfEmpty()  
  22.       select new  
  23.       {  
  24.         EmployerID = e == null ? null : (int?)e.EmployerID,  
  25.         Fname = e.Fname,  
  26.         Lname = e.Lname,  
  27.         EmployerDepartmentID = e == null ? null : (int?)e.DepartmentID,  
  28.         Department = d.Caption,  
  29.         DepartmentID = (int?)d.DepartmentID  
  30.       };  
  31.   
  32. var FullJoin = left.Union(right);  
  33.   

  34. var left = db.Employers.GroupJoin(db.Departments, e => e.DepartmentID, d => d.DepartmentID, (e, d) => new  
  35. {  
  36.   employer = e,  
  37.   department = d  
  38. })  
  39. .SelectMany(e => e.department.DefaultIfEmpty(), (e, d) => new  
  40. {  
  41.   EmployerID = (int?)e.employer.EmployerID,  
  42.   Fname = e.employer.Fname,  
  43.   Lname = e.employer.Lname,  
  44.   EmployerDepartmentID = (int?)e.employer.DepartmentID,  
  45.   Department = d.Caption,  
  46.   DepartmentID = d == null ? null : (int?)d.DepartmentID  
  47. });  
  48.   
  49. var right = db.Departments.GroupJoin(db.Employers, d => d.DepartmentID, e => e.DepartmentID, (d, e) => new  
  50. {  
  51.   department = d,  
  52.   employer = e  
  53. })  
  54. .SelectMany(d => d.employer.DefaultIfEmpty(), (d, e) => new  
  55. {  
  56.   EmployerID = e == null ? null : (int?)e.EmployerID,  
  57.   Fname = e.Fname,  
  58.   Lname = e.Lname,  
  59.   EmployerDepartmentID = e == null ? null : (int?)e.DepartmentID,  
  60.   Department = d.department.Caption,  
  61.   DepartmentID = (int?)d.department.DepartmentID  
  62. });  
  63.   
  64. var FullJoin = left.Union(right);  
It's useful to mention that all non nullable fields are converted to nullable because union operation must combine both null and positive values for EmployerID and DepartmentID fields. Furthermore, as some results are null and some are not, they must be implemented as nullable.