Performance Comparison Using Three Different Methods for Joining Lists

Combining Lists comparing AddRange, Arrays and Concat

I had to combine two large lists, and was curious what the performance was of various methods. The 3 methods that came to mind was:

  1. Using AddRange
  2. Using an array to copy both lists into, then convert the array back to the final list.
  3. Using Linq's Concat extension method.

I was looking for the straight union of the two lists. No sorting, merging or filtering.

To compare the methods I had in mind, I decided to combine 2 huge lists and measure the time it took. That's not perfect, since it's a rare use-case, but performance tests are necessarily contrived in one dimension or other. Milliseconds and up are easier to compare than ticks, so I use a dataset that gets me in that range.

When it comes to measuring performance, I am a bit paranoid about compiler and run-time optimizations, and a collection with all zeroes strikes me as a good target to optimize for. So I add some guff to the collections before combining them.

I use the StopWatch class as time-taker. I don't like repeated start/stop calls, as it's easy to inadvertently get some code in between start and stop that was not part of the code to be measured. I isolated the StopWatch calls using an interface like this:

interface ITimedWorker
{
    string Label { get; }
    bool Check();
    void Work();
}

Then I can make a single measure with a method like this:

static void RunTimedOperation(ITimedWorker op)
{
    Stopwatch sw = new Stopwatch();

    Log("Starting {0}", op.Label);
    GC.Collect();
    sw.Start();
    op.Work();
    sw.Stop();
    if (!op.Check())
        Log("{0} failed its check", op.Label);
    Log("{0} finished in:\n\t{1:D2}:{2:D2}:{3:D2}:{4:D4}", op.Label, sw.Elapsed.Hours, sw.Elapsed.Minutes, sw.Elapsed.Seconds, sw.Elapsed.Milliseconds);
}

To get around garbage collection interference, I tried various ways to disable or discourage it during the actual combine operation I wanted to measure, but it did not work for the data sizes I was using. I found that inducing the collect just before the operation was enough to not trigger any collections where I didn't want them.

Performance Diagnostics Session Screenshot

The test program is written to run either a single test or all three. When running all three tests, to alleviate the issues with garbage collection, the code picks pseudo-randomly which method to run first. It is an attempt to be fair to each of the three methods, since the garbage collector will be more likely to run in subsequent operations compared to the first one completed.

For the combining logic, I created a common base class for the three methods:

abstract class ListJoinerBase
{
    protected List<object> m_l1, m_l2, m_lCombined;
    private long m_goalCount;

    protected ListJoinerBase(List<object> l1, List<object> l2)
    {
        m_l1 = l1; m_l2 = l2;
        m_goalCount = m_l1.Count + m_l2.Count;
    }

    public abstract void Work();
    public bool Check()
    {
        Debug.Assert(m_lCombined.Count == m_goalCount);
        return m_lCombined.Count == m_goalCount;
    }
}

I checked correctness with the debugger. To minimize the chance of new errors getting introduced as code is added, I use a simple check method, which just verifies that the combined list is of the expected size.

Then remaining work was to add labels and implement the Work() method for the three specializations:

class ListJoinerArray : ListJoinerBase, ITimedWorker
{
    [...]
    public override void Work()
    {
        object[] combined = new object[m_l1.Count + m_l2.Count];

        m_l1.CopyTo(combined);
        m_l2.CopyTo(combined, m_l1.Count);
        m_lCombined = new List<object>(combined);
    }
}
class ListJoinerAddRange : ListJoinerBase, ITimedWorker
{
    [...]
    public override void Work()
    {
        m_lCombined = new List<object>();
        m_lCombined.AddRange(m_l1);
        m_lCombined.AddRange(m_l2);
    }
}
class ListJoinerConcat : ListJoinerBase, ITimedWorker
{
    [...]
    public override void Work()
    {
        m_lCombined = m_l1.Concat(m_l2).ToList();
    }
}

Results

The Array and AddRange methods were close, with an edge to Array copying. The Concat method was somewhat behind. The first two methods came in at 18-24 milliseconds for the data size I was using. The Concat method took 144-146 milliseconds. Based on this, I decided to use the array copying method. Before deciding yourself, I encourage you to download the program, play with the methods and add any others you want to compare with, and come to your own conclusions.

Performance Results

Appendix: Sample output from program

[usrdir]\source\repos\ListCombineDemo\results>..\bin\Release\ListCombineDemo.exe /all
Starting Join lists Using Concatenation
Join lists Using Concatenation finished in:
        00:00:00:0147
Starting Join lists Using Arrays
Join lists Using Arrays finished in:
        00:00:00:0018
Starting Join lists Using AddRange
Join lists Using AddRange finished in:
        00:00:00:0024

[usrdir]\source\repos\ListCombineDemo\results>..\bin\Release\ListCombineDemo.exe /all
Starting Join lists Using AddRange
Join lists Using AddRange finished in:
        00:00:00:0024
Starting Join lists Using Concatenation
Join lists Using Concatenation finished in:
        00:00:00:0140
Starting Join lists Using Arrays
Join lists Using Arrays finished in:
        00:00:00:0019

[usrdir]\source\repos\ListCombineDemo\results>..\bin\Release\ListCombineDemo.exe /random
Starting Join lists Using AddRange
Join lists Using AddRange finished in:
        00:00:00:0023

[usrdir]\source\repos\ListCombineDemo\results>..\bin\Release\ListCombineDemo.exe /concat
Starting Join lists Using Concatenation
Join lists Using Concatenation finished in:
        00:00:00:0145

[usrdir]\source\repos\ListCombineDemo\results>..\bin\Release\ListCombineDemo.exe /arrays
Starting Join lists Using Arrays
Join lists Using Arrays finished in:
        00:00:00:0020

Appendix: Some raw results in seconds

Method Seconds
Arrays 0.0021
Arrays 0.0019
Arrays 0.0019
Arrays 0.0018
Arrays 0.0019
Arrays 0.0019
Arrays 0.0019
Arrays 0.0018
Arrays 0.002
Arrays 0.0019
Arrays 0.0018
Concatenation 0.0145
Concatenation 0.0145
Concatenation 0.0144
Concatenation 0.0146
Concatenation 0.0144
Concatenation 0.0145
Concatenation 0.0144
Concatenation 0.0145
Concatenation 0.0145
Concatenation 0.0147
Concatenation 0.0144
AddRange 0.0022
AddRange 0.0021
AddRange 0.0021
AddRange 0.0022
AddRange 0.0022
AddRange 0.0022
AddRange 0.0027
AddRange 0.0025
AddRange 0.0024
AddRange 0.0025
AddRange 0.0023


Similar Articles