Boxing and Performance of Collections

In this article, I will compare some performance issues of values and references types during boxing and unboxing operations.


In this article, I will discuss some performance issues on collections such as ArrayList during boxing and unboxing operations.

A few days ago, a post alerted me on C# Corner discussion forums about collection and performance issues during boxing and unboxing. A collection such as an ArrayList stores data as 'generic objects' and boxing and unboxing operations may be used to store and retrieve data from a collection. During process of boxing and unboxing, it may be possible that the performance of conversion is being affected due the large number of items stored in a collection. I've done a bit of research over this and here is my conclusion.

First (before go to collections), we need to determine the general overhead product of boxing/unboxing operation. I've compiled few small test programs and examined the IL generated using ILDASM. Here are the conclusions: 

General

As a general conclusion, we can say that an object is equivalent to a 'void *' of c++. It is always a reference to the contained value, then, the utilization of object generate a very efficient code that move pointers through functions call and returns.

a) Value Types

Value Types (int, decimals, structs and so on) are stored and utilized in a basic fashion. When we box/unbox a Value Type, the compiler insert an specific box/unbox IL instruction which, in runtime, 'makes' the object wrapper that reference to that Value Type. This kind of operation must make a lot of manipulation:  

  • Pop the Value from the evaluator stack. (May be skipped by the jitter optimizer)
  • Store the popped value in a safety place.
  • Construct the wrapper structure and make this point to the popped value.
  • Push the wrapper to evaluator stack. (May be skipped by the jitter optimizer)

Of course, the jitter may optimize this sequence because it knows that we are going to box an specified Value Type. Since jitter is not documented and doesn't generate any report, so we'll leave this part alone.

using System;
namespace test1
{
class Class1
{
static void Main(string[] args)
{
object o;
int i = 12345;
// Box and unbox
o = (object) i; // Boxing
i = (int) o; // Unboxing
}
}
}

Source Code - Boxing and Unboxing Integers

.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 22 (0x16)
.maxstack 1
.locals ([0]
object o,
[1] int32 i)
IL_0000: ldc.i4 0x3039
IL_0005: stloc.1
// Now, the 'boxing' operations
IL_0006: ldloc.1
IL_0007: box [mscorlib]System.Int32
IL_000c: stloc.0
// Now, the 'unboxing' operations
IL_000d: ldloc.0
IL_000e: unbox [mscorlib]System.Int32
IL_0013: ldind.i4
IL_0014: stloc.1
IL_0015: ret
}
// end of method Class1::Main

IL generated Code

b) Reference Types

Reference Types (strings, classes, etc), when created, are stored directly as an object. Since the compiler knows that this object is of a predetermined type, and there is no need to verify the type of the object each time that is utilized. When we 'box' a Reference Type, the compiler makes nothing because, as I said before, a Reference Type is stored as an object. When we utilize a Reference Type that have been boxed, we must write a 'cast' into source and the compiler insert an specific 'cast' IL instruction. I don't know how this 'cast' instruction is jitted but I imagine that, if the object type is the same that the requested type, very few native instructions may me executed.

using System;
namespace test1
{
class Class1
{
static void Main(string[] args)
{
object o;
string s = "12345";
// Box and unbox
o = (object) s; // Boxing
s = (string) o; // Unboxing
}
}
}

Source Code - Boxing and Unboxing Strings

.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 16 (0x10)
.maxstack 1
.locals ([0]
object o,
[1]
string s)
IL_0000: ldstr "12345"
IL_0005: stloc.1
// 'Boxing'. Make nothing (only the assignment) because string==object
IL_0006: ldloc.1
IL_0007: stloc.0
// 'Unboxing'. Only make a 'cast' that may be very efficient if the
// the requested type is equal to the stored type.
IL_0008: ldloc.0
IL_0009: castclass [mscorlib]System.String
IL_000e: stloc.1
IL_000f: ret
}
// end of method Class1::Main

IL Generated Code

Conclusion

As a general rule, if we store/retrieve Reference Types into collections, we have a very small impact over the performance. But, if we store Value Types, the performance may be severely affected.

To test this assumption, I've written few small test programs. Each of these programs were executed 10 times and the best results are extracted:

using System;
using System.Collections;
namespace test1
{
class Class1
{
static void Main(string[] args)
{
int count;
DateTime startTime = DateTime.Now;
ArrayList myArrayList =
new ArrayList();
// Repeat test 5 times.
for(int retry = 5; retry > 0; retry--)
{
myArrayList.Clear();
// Add 'Value Types' to array the ArrayList.
for(count = 0; count < 1000000; count++)
myArrayList.Add(count);
// Retrieve the values.
int i;
for(count = 0; count < 1000000; count++)
i = (
int) myArrayList[count];
}
// Print results.
DateTime endTime = DateTime.Now;
Console.WriteLine("Start: {0}\nEnd: {1}\nElapsed: {2}",
startTime, endTime, endTime-startTime);
Console.WriteLine("Ready. Push ENTER to finalize...");
Console.ReadLine();
}
}
}

Test 1 - Storing/Retrieving integers to/from ArrayList

Result: On my machine, this program takes 6,409 seconds as its best time for execution.

using System;
using System.Collections;
namespace test1
{
class Class1
{
static void Main(string[] args)
{
int count;
ArrayList myArrayList =
new ArrayList();
// Construct 1000000 strings
string [] strList = new string[1000000];
for(count = 0; count < 1000000; count++)
strList[count] = count.ToString();
// Repeat test 5 times.
DateTime startTime = DateTime.Now;
for(int retry = 5; retry > 0; retry--)
{
myArrayList.Clear();
// Add 'Value Types' to array the ArrayList.
for(count = 0; count < 1000000; count++)
myArrayList.Add(strList[count]);
// Retrieve the values.
string s;
for(count = 0; count < 1000000; count++)
s = (
string) myArrayList[count];
}
// Print results.
DateTime endTime = DateTime.Now;
Console.WriteLine("Start: {0}\nEnd: {1}\nElapsed: {2}",
startTime, endTime, endTime-startTime);
Console.WriteLine("Ready. Push ENTER to finalize...");
Console.ReadLine();
}
}
}

Test 2 - Storing/Retrieving strings to/from ArrayList

This program takes 3,565 seconds on my machine.  As we can see, it is very efficient to store 'references' types in collections than storing value types.