Increase Performance Of LINQ By Parallelism

Introduction

In our routine life, we can perform faster when we have a team. The tasks are divided into team members, and each member works on a task in parallel with others. This is the same mechanism in the computer world, where one task is split into multiple tasks and distributed among processors. The split tasks are executed concurrently, and this gives an increase in performance. This mechanism is called Parallelism.

LINQ (Language Integrated Query) is very familiar among C# developers, and operators like WHERE, SELECT, and GROUPBY are used in daily coding life. With the help of the ParallelEnumerable class in .NET Framework, we get extension methods to LINQ. I want to discuss how to execute LINQ in parallel on data.

Let's define a class City and load some data that can be used in LINQ and PLINQ.

class City
{
    public int Id { get; set; }
    public string CityName { get; set; }
    public string Country { get; set; }
}
var cities = new[]
{
    new City { Id = 1, CityName = "Turku", Country = "Finland" },
    new City { Id = 2, CityName = "Paris", Country = "France" },
    new City { Id = 3, CityName = "Oslo", Country = "Norway" },
    new City { Id = 4, CityName = "Helsinki", Country = "Finland" },
    new City { Id = 5, CityName = "Turku", Country = "Finland" },
    new City { Id = 6, CityName = "Paris", Country = "France" },
    new City { Id = 7, CityName = "Oslo", Country = "Norway" },
    new City { Id = 8, CityName = "Helsinki", Country = "Finland" },
    new City { Id = 9, CityName = "Turku", Country = "Finland" },
    new City { Id = 10, CityName = "Paris", Country = "France" },
    new City { Id = 11, CityName = "Oslo", Country = "Norway" },
    new City { Id = 12, CityName = "Helsinki", Country = "Finland" },
    new City { Id = 13, CityName = "Turku", Country = "Finland" },
    new City { Id = 14, CityName = "Paris", Country = "France" },
    new City { Id = 15, CityName = "Oslo", Country = "Norway" },
    new City { Id = 16, CityName = "Helsinki", Country = "Finland" },
    new City { Id = 17, CityName = "Turku", Country = "Finland" },
    new City { Id = 18, CityName = "Paris", Country = "France" },
    new City { Id = 19, CityName = "Oslo", Country = "Norway" },
    new City { Id = 20, CityName = "Helsinki", Country = "Finland" }
};

The class City and some data for cities array.

Now, we have data with the “cities” variable, and we can start using LINQ and PLINQ execution.

What is LINQ?

LINQ runs it in a sequential query. Assume we want to find records that have cities from Finland.

var finCities = cities.Where(c => c.Country == "Finland");
foreach (City city in finCities)
{
    Console.WriteLine(city.CityName);
}

The LINQ query to get records.

Output

Increase Performance Of LINQ By Parallelism

Figure.1

Note. that the order of records in fig:1 is sequential because it is executing in a sequence. It is the default behavior of LINQ.

What is PLINQ?

Assume we want to execute the same query but in parallel. We can use the AsParallel() method, which is part of the ParallelEnumerable class.

var finCities = cities.AsParallel().Where(c => c.Country == "Finland");

Getting records with parallel LINQ.

Output

 Performance Of LINQ By Parallelism

Figure. 2

Note. that the order of records is not sequential when we compare it with LINQ(fig.1) or if compared with the order of data defined with variable “cities”.

It is possible to use the AsOrdered() method to force results in order while it executes in parallel.

var finCities = cities
    .AsParallel()
    .AsOrdered()
    .Where(c => c.Country == "Finland");

The result is in order with PLINQ.

The output is the same as in Fig.1.

Conclusion

The reason behind the unordered records in Fig 2 is I have multiprocessors on my machine, and by default, PLINQ splits tasks for all processors available on a machine with a maximum limit of 64. If we want to control the number of processors to be used in PLINQ, use the WithDegreeOfParallelism(numberOfProcessor) method, where you need to pass a number to the method as a parameter to define the limit of processors. The PLINQ is very useful when we deal with complex queries. This is a small example for demonstration - how to use PLINQ; if you compare the execution time you would not notice much difference because the task is very small.


Similar Articles