Grouped Joins in LINQ

Rahul Singh
11y
80.3k
0
4

Article

We have 4 types of joins for Language Integrated Query (LINQ), these are:

Inner Join
Left Outer Join
Cross Join / Full Outer Join
Group Join

These names not only sound alike in name but also are like the type of joins present in any relational database management system like SQL Server and they even operate in the same way. The following figure provides a high-level explanation of all these joins except the Group Join that is specific to LINQ.

So, in this article we will discuss Group Joins with a few examples.

According to MSDN:

The group join is useful for producing hierarchical data structures. It pairs each element from the first collection with a set of correlated elements from the second collection.

This definiation will become more clear with the following examples.

Example 1

Consider we have the following Student and Department tables:

Student Table:

StudentID	Name	Gender	DepartmentID
1	Mohan	Male	2
2	Vivek	Male	2
3	Saurabh	Male	1
4	Priya	Female	2
5	Sohan	Male	1
6	Monika	Female	2

Department Table:

DepartmentID	Department Name
1	CSE
2	ECE
3	IT

The following is the object representation of these tables:

List<Student> students = new List<Student>
{
new Student { StudentID =1, Name = "Mohan", Gender = "Male", DepartmentID =2 },
new Student { StudentID =2, Name = "Vivek", Gender = "Male", DepartmentID =2 },
new Student { StudentID =3, Name = "Saurabh", Gender = "Male", DepartmentID =1 },
new Student { StudentID =4, Name = "Priya", Gender = "Female", DepartmentID =2 },
new Student { StudentID =5, Name = "Sohan", Gender = "Male", DepartmentID =1 },
new Student { StudentID =6, Name = "Monika", Gender = "Female", DepartmentID =2 }
};
List<Department> departments = new List<Department>
{
new Department { DepartmentID = 1, DepartmentName = "CSE" },
new Department { DepartmentID = 2, DepartmentName = "ECE" },
new Department { DepartmentID = 3, DepartmentName = "IT" },
};

Where, I have used the following types:

public class Student
{
public int StudentID { get; set; }
public string Name { get; set; }
public string Gender { get; set; }
public int DepartmentID { get; set; }
}
public class Department
{
public int DepartmentID { get; set; }
public string DepartmentName { get; set; }
}

Now, what we want is to list all the departments and students within them. Here is the LINQ query for this:

var result = from d in departments
join s in students
on d.DepartmentID equals s.DepartmentID into g
select new
{
DepartmentName = d.DepartmentName,
Students = g
};

As we can see, the preceding query is like a basic inner join except the into g statement that differentiates it from inner joins. Now, if we again revisit the definition of Group Join, each element (departmentId from the Department table in this case) is paired with a set of correlated elements in the second collection (Students collection). So "g" in this case will hold the Student object, so we can use that in our foreach loop like this:

foreach (var item in result)
{
Console.WriteLine("DepartmentName: {0}", item.DepartmentName);
foreach (var stu in item.Students)
{
Console.WriteLine(stu.Name);
}
Console.WriteLine();
}

This will produce the following output:

The same query can also be written in Method Syntax using the GroupJoin Enumerable method. But, I personally prefer the query syntax when working with joins in LINQ since the method syntax in this case is a bit complex and difficult to read. Here is the equivalent query:

var result1 = departments.GroupJoin(students,
d => d.DepartmentID,
s => s.DepartmentID,
(d, g) => new
{
DepartmentName = d.DepartmentName,
Students = g
});

Example 2

After the first example most of us would think that it's almost similar to inner joins, but in this example I will explain a bit more complex problem statement that will clarify group joins.

Consider I have two tables Parent and Child linked with a foreign key, but here for simplicity I am displaying them as a collection of objects:

IEnumerable<Parent> parents = new Parent[]
{
new Parent { ID = 1, Description = "Apple" },
new Parent { ID = 2, Description = "Orange" },
new Parent { ID = 3, Description = "Pear" },
new Parent { ID = 4, Description = "Banana" }
};
IEnumerable<Child> children = new Child[]
{
new Child { ID = 1, ParentID = 2, Description = "Mercury", Condition = null },
new Child { ID = 2, ParentID = 3, Description = "Venus", Condition = null },
new Child { ID = 3, ParentID = 3, Description = "Earth", Condition = null },
new Child { ID = 4, ParentID = 4, Description = "Mars", Condition = null },
new Child { ID = 5, ParentID = 4, Description = "Saturn", Condition = "> 5" }
};

Now, I want to fetch the result based on certain conditions that are:

Return exactly 1 row per parent regardless of how many children exist.
Return null/zero values if the child doesn't exist, in other words default values of type.
Return related data from a child with a null condition. If more than one exists with a null condition then return just the first one.
Return a count of the number of child records with a non-null condition.

So based on input data, if we consider the preceding rules the output should be:

ParentId	PrimaryChildId	Description	SubDescription	Count
1	0	Apple		0
2	1	Orange	Mercury	0
3	2	Pear	Venus	0
4	4	Banana	Mars	1

This is a classic example of a Group Join because here for each element from the first collection, in other words ParentId from the Parent table, we need to pair with respective parentsIds present in the Child table. When the basic initial part of LINQ query is executed, in other words:

var query = from p in parents
join c in children
on p.ID equals c.ParentID into g

The items will be paired as shown below in memory:

---------------------------------------------------------------------------------------------------------

       ParentId                                                g
---------------------------------------------------------------------------------------------------------
      1                                                                null

      2                       ID = 1, ParentID = 2, Description = "Mercury", Condition = null

    3                      ID = 2, ParentID = 3, Description = "Venus", Condition = null
                              ID = 3, ParentID = 3, Description = "Earth", Condition = null

      4                      ID = 4, ParentID = 4, Description = "Mars", Condition = null
                              ID = 5, ParentID = 4, Description = "Saturn", Condition = "> 5"

--------------------------------------------------------------------------------------------------------------

As we can see for each parent Id of the Parent object a collection of Child objects with matching ParentId is generated, so now all that is left is to play with this resulting collection based on the rules provided. Here is the complete query:

var query = from p in parents
join c in children
on p.ID equals c.ParentID into g
let firstNullElement = g.FirstOrDefault(x => x.Condition == null)
select new
{
ParentID = p.ID,
PrimaryChildID = firstNullElement != null ? firstNullElement.ID : 0,
Description = p.Description,
SubDescription = firstNullElement!= null ? firstNullElement.Description
: String.Empty,
ConditionalCount = g.Count(x => x.Condition != null)
};

As you can see, I am fetching the first element that has a "condition" column as null in the "g" collection, so while projecting the actual result this variable can be used.

Please let me know if you have any issues. I have attached the complete working code of these examples.

Happy Coding :)