A Deep Dive into Parallel.ForEach and List.ForEach

Introduction

Parallel processing has become an essential aspect of modern software development, allowing developers to exploit the power of multi-core processors for improved performance. In C#, two commonly used methods for parallel execution are Parallel.ForEach and List.ForEach. In this blog post, we'll explore the nuances of these two approaches, their differences, and when to choose one over the other.

The source code can be downloaded from GitHub.

Parallel.ForEach

Parallel.ForEach is part of the System.Threading.Tasks namespace and provides a convenient way to parallelize the execution of a loop. It's particularly useful for processing large collections, where each iteration of the loop can be executed independently.

Basic Syntax

Parallel.ForEach(collection, item =>
{
    // Your parallelized logic here
});

Key Features of Parallel.ForEach

  • Automatic Partitioning: Parallel.ForEach automatically divides the input collection into partitions and processes them concurrently, optimizing the use of available CPU cores.
  • Load Balancing: The workload is distributed dynamically among the available threads, ensuring efficient resource utilization.
  • Cancellation Support: It supports cancellation through a CancellationToken, providing control over the parallel execution.

List.ForEach

List.ForEach is a method available on lists in C# and is not inherently parallel. It operates synchronously, processing each element of the list sequentially.

Basic Syntax

list.ForEach(item =>
{
    // Your sequential logic here
});

Key Features of List.ForEach

  • Simplicity: List.ForEach is straightforward and easy to use. It's suitable for scenarios where parallelism isn't a critical requirement.
  • Single-Threaded Execution: It executes the provided action on each element of the list in a single thread, making it suitable for situations where parallelism may not be beneficial or could introduce unnecessary complexity.

Choosing Between Parallel.ForEach and List.ForEach


When to Use Parallel.ForEach?

  • Large Datasets: It excels when dealing with large datasets where parallelism can significantly improve performance.
  • CPU-Intensive Tasks: Tasks that are CPU-bound and can be easily parallelized benefit from the automatic partitioning and load balancing provided by Parallel.ForEach.
  • Optimizing Performance: When optimizing for performance and the nature of the task, it allows for parallel execution.

When to Use List.ForEach?

  • Small Datasets: For smaller datasets, the overhead of parallelization might outweigh the benefits.
  • Sequential Dependencies: When the order of execution is crucial, and tasks are dependent on the results of previous iterations.
  • Simplicity: In scenarios where simplicity and readability are prioritized over parallelism.

Performance Comparison

Below is a simple benchmarking example that compares the performance of Parallel.ForEach and List.ForEach uses a CPU-bound operation on a large collection of integers.

First, ensure you have the Benchmark.NET package installed in your project. If you haven't installed it yet, you can do so via the NuGet Package Manager or by using the following command in the Package Manager Console.

Install-Package BenchmarkDotNet

Next, create a benchmark class using Benchmark.NET attributes to measure the execution time of Parallel.ForEach and List.ForEach. Here's an example.

using BenchmarkDotNet.Attributes;

namespace ForEachAndListForEach
{
    public class ParallelVsListBenchmark
    {
        private List<int> data;

        [Params(1000000)] // Adjust the size of the dataset as needed
        public int DataSize;

        [GlobalSetup]
        public void Setup()
        {
            // Generating a large collection of integers
            Random rand = new Random();
            data = new List<int>(DataSize);
            for (int i = 0; i < DataSize; i++)
            {
                data.Add(rand.Next(100));
            }
        }

        [Benchmark]
        public void ParallelForEachBenchmark()
        {
            Parallel.ForEach(data, item =>
            {
                // Simulating a CPU-bound operation (e.g., some intensive calculation)
                int result = Calculate(item);
            });
        }

        [Benchmark]
        public void ListForEachBenchmark()
        {
            data.ForEach(item =>
            {
                // Simulating the same CPU-bound operation
                int result = Calculate(item);
            });
        }

        // Simulated CPU-bound operation
        private int Calculate(int value)
        {
            // Simulating some CPU-bound calculation
            return value * value;
        }
    }
}

Program.cs
// See https://aka.ms/new-console-template for more information
using BenchmarkDotNet.Running;
using ForEachAndListForEach;

Console.WriteLine("Performance Comparison between Parallel.ForEach and List.ForEach");

var summary = BenchmarkRunner.Run<ParallelVsListBenchmark>();

Console.ReadLine();

Result

Result

Based on this benchmark, Parallel.ForEach tends to perform slightly better in terms of average execution time when dealing with a large dataset and CPU-bound operations. The results also indicate that List.ForEach takes a bit longer, albeit the difference might be small depending on the specific use case and system configurations.

Conclusion

Choosing between Parallel.ForEach and List.ForEach depends on the specific requirements of your application. Understanding the nature of the task, the size of the dataset, and the need for parallelism are crucial factors in making the right decision. Both methods have their strengths, and by leveraging them appropriately, you can optimize the performance of your C# applications.