While building modern applications, we often need to deal with streaming data, batch pipelines, and parallel processing. For building ETL jobs, background processors, or event-driven systems, we need a way to:
In this article, we’ll build a parallel-processing pipeline using System.Threading.Tasks.Dataflow.
What is TPL Dataflow?
Task Parallel Library (TPL) Dataflow is a library that helps you build data processing pipelines using blocks.
Each block will receive input, process it, and pass it to the next block like an assembly line in a factory.
Real-world example: Car Manufacturing Pipeline
![]()
We’ll simulate a car production pipeline, where each step represents a stage:
Build frame
Install engine
Add interior
Paint
Deliver
Each stage can run in parallel, but the order of operations is preserved per item.
public class DataflowHandsOn
{
public static async Task Run()
{
// Configure options for parallelism
var parallelOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 5
};
// Step 1: Create cars
var carList = new TransformManyBlock<string, string>(_ =>
{
var cars = new List<string> { "Suzuki", "Tata", "BYD", "BMW", "Tesla" };
return cars;
}, parallelOptions);
// Step 2: Create frame
var createFrame = new TransformBlock<string, string>(car =>
{
Console.WriteLine($"Creating frame for {car}");
Task.Delay(300).Wait();
return car;
}, parallelOptions);
// Step 3: Put engine
var configureEngine = new TransformBlock<string, string>(car =>
{
Console.WriteLine($"Installing engine for {car}");
Task.Delay(300).Wait();
return car;
}, parallelOptions);
// Step 4: Add interior
var addInterior = new TransformBlock<string, string>(car =>
{
Console.WriteLine($"Adding interior for {car}");
Task.Delay(300).Wait();
return car;
}, parallelOptions);
// Step 5: Paint
var addPaint = new TransformBlock<string, string>(car =>
{
Console.WriteLine($"Painting {car}");
Task.Delay(300).Wait();
return car;
}, parallelOptions);
// Step 6: Delivery
var delivery = new ActionBlock<string>(car =>
{
Console.WriteLine($"Delivering {car}");
Task.Delay(200).Wait();
}, parallelOptions);
// Link the blocks in order
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
carList.LinkTo(createFrame, linkOptions);
createFrame.LinkTo(configureEngine, linkOptions);
configureEngine.LinkTo(addInterior, linkOptions);
addInterior.LinkTo(addPaint, linkOptions);
addPaint.LinkTo(delivery, linkOptions);
// Start the workflow
carList.Post("Start");
// Mark completion for the head block
carList.Complete();
// Wait for completion of the entire pipeline
await delivery.Completion;
Console.WriteLine("All cars processed and delivered!");
}
}
Understanding the Pipeline
1. TransformManyBlock
var carList = new TransformManyBlock<string, string>(...)
This is data source or ingestion stage.
2. TransformBlock
This is independent processing units. Each stage transforms input to output.
var createFrame = new TransformBlock<string, string>(car => ...)
In our example pipeline stages are Frame creation, Engine installation, Interior setup, Painting etc.
3. ActionBlock
var delivery = new ActionBlock<string>(car => ...)
This is final stage. It only performs action with no return value.
Here, MaxDegreeOfParallelism = 5, we can process 5 requests simultaneously per stage.
LinkTo is connecting blocks together for creating a pipeline chain.
Here, PropagateCompletion = true represents when first block completes, all downstream blocks complete automatically.
Let’s call this function from program.cs
await DataflowHandsOn.Run();
The output is like this:
![]()
Conclusions
This output highlights how TPL Dataflow enables the design of high-throughput, parallel pipelines where each item flows independently through ordered stages, closely mirroring real-world processing systems. This approach is especially important in ETL (Extract, Transform, Load) scenarios, where large volumes of data must be ingested, transformed, and loaded efficiently, as it maximizes performance, ensures scalability, and provides a clean, maintainable architecture for handling complex data workflows without bottlenecks.