Increase Performance Of LINQ By Parallelism

Parallelism solves many problems in the computing world if done in the right way; parallelism is not good if the task is small and needs synchronization of resources, such a scenario would hurt performance.

In our routine life, we can perform faster when we have a team. The tasks are divided into team members and each member works on a task in parallel with others. This is the same mechanism in the computer world where one task is split into multiple tasks and distributed among processors. The split tasks are executed concurrently and this gives an increase in performance. This mechanism is called Parallelism.

LINQ (Language Integrated Query) is very familiar among C# developers and operators like WHERE, SELECT, GROUPBY are used in daily coding life. With the help of ParallelEnumerable class in .NET Framework, we get extension methods to LINQ. I want to discuss how to execute LINQ in parallel on data.

Let's define a class City and load some data which can be used in LINQ and PLINQ.

  1.     class City  
  2.     {  
  3.         public int Id { getset; }  
  4.   
  5.         public string   CityName { getset; }  
  6.   
  7.         public string Country { getset; }  
  8.     }  
  9. var cities = new[] {  
  10.                 new City { Id = 1,  CityName = "Turku"  , Country = "Finland" },  
  11.                 new City { Id = 2,  CityName = "Paris"  , Country = "France" },  
  12.                 new City { Id = 3,  CityName = "Oslo"    ,  Country = "Norway" } ,  
  13.                 new City { Id = 4,  CityName = "Helsinki"     , Country = "Finland" },  
  14.   
  15.                 new City { Id = 5,  CityName = "Turku"  , Country = "Finland" },  
  16.                 new City { Id = 6,  CityName = "Paris"  , Country = "France" },  
  17.                 new City { Id = 7,  CityName = "Oslo"    ,  Country = "Norway" } ,  
  18.                 new City { Id = 8,  CityName = "Helsinki"     , Country = "Finland" } ,  
  19.   
  20.                 new City { Id = 9,  CityName = "Turku"  , Country = "Finland" },  
  21.                 new City { Id = 10,  CityName = "Paris"  , Country = "France" },  
  22.                 new City { Id = 11,  CityName = "Oslo"    ,  Country = "Norway" } ,  
  23.                 new City { Id = 12,  CityName = "Helsinki"     , Country = "Finland"},  
  24.   
  25.                 new City { Id = 13,  CityName = "Turku"  , Country = "Finland" },  
  26.                 new City { Id = 14,  CityName = "Paris"  , Country = "France" },  
  27.                 new City { Id = 15,  CityName = "Oslo"    ,  Country = "Norway" } ,  
  28.                 new City { Id = 16,  CityName = "Helsinki"     , Country = "Finland"},  
  29.   
  30.                 new City { Id = 17,  CityName = "Turku"  , Country = "Finland" },  
  31.                 new City { Id = 18,  CityName = "Paris"  , Country = "France" },  
  32.                 new City { Id = 19,  CityName = "Oslo"    ,  Country = "Norway" } ,  
  33.                 new City { Id = 20,  CityName = "Helsinki"     , Country = "Finland"}  
  34.              };  
  35.  

Listing 1: The class City and some data for cities array

Now, we have data with “cities” variable and we can start using LINQ and PLINQ execution.
 

LINQ

LINQ runs it in a sequential query. Assume,we want to find records which have cities from Finland.

  1. var finCities = cities.Where(c => c.Country == "Finland");  
  2.                                           
  3.  foreach (City city in finCities)  
  4.         Console.WriteLine(city.CityName);  

Listing 2: The LINQ query to get records

Output

Increase Performance Of LINQ By Parallelism
 (fig:1)

Note that the order of records in fig:1 is sequential because it is executing in a sequence. It is the default behavior of LINQ.

PLINQ

Assume, we want to execute the same query but in parallel. We can use AsParallel() method which is part of ParallelEnumerable class.

  1. var finCities = cities.AsParallel().Where(c => c.Country == "Finland");  

Listing 3: Getting records with parallel LINQ

Output

Increase Performance Of LINQ By Parallelism
(fig: 2)

Note that the order of records is not sequential when we compare it with LINQ(fig:1) or if compared with the order of data defined with variable “cities”(Listing: 1).

It is possible to use the AsOrdered() method to force results in order while it executes in parallel.

  1. var finCities = cities.AsParallel().AsOrdered().Where(c => c.Country == "Finland");  

Listing 4: The result in order with PLINQ

The output is the same as Fig:1

Conclusion

The reason behind unordered records in fig 2 is, I have multiprocessors on my machine and by default, PLINQ splits tasks for all processors available on a machine with a maximum limit of 64. If we want to control the number of processors to be used in PLINQ, use WithDegreeOfParallelism(numberOfProcessor) method where you need to pass a number to the method as a parameter to define the limit of processors. The PLINQ is very useful when we deal with complex queries. This is a small example for demonstration - how to use PLINQ, if you compare the execution time of Listing:2 and Listing:3; you would not notice much difference because the task is very small.

Source code

Github Link