Fast Equality Comparison

Roman Sakno
5y
33.6k
0
4

Article

Preface

Have you experienced the problem of comparing two arrays in a fast way Or comparing two structs? No, I'm not talking about for/foreach loop, used to iterate over elements, for comparison between each other because it's slow. However, for arrays of reference type, it is the only way to do so. This is because the array, actually stores the reference (address) to the real object, instead of its content. Therefore, such arrays are out of scope of this article. Let's concentrate on a situation, when array elements are of value type.

You can say that "wait, there is SequenceEqual method for such purposes". I thought that too. But, this way has the following limitations:

It requires that type T should implement IEquatable interface
If T is not blittable type then equality comparison cannot perform fast bitwise comparison
Can compare two structs of the same type

Moreover, .NET Core 2.x and .NET Framework 4.x have additional restrictions: type T should be of a primitive type. Just take a look at this code. The constrained is caused by private method IsTypeComparableAsBytes. It checks whether the type T is primitive type. Since 3.0, this limitation is fixed. But, it still uses unaligned memory access, because there is no guarantee that the pointer stored in Span<T> is aligned. The unaligned memory access is bad and here you can find the answer why.

In this article, I'll show you how avoid all of these limitations and achieve the best possible performance, when comparing objects of value type and arrays with elements of value types. It is a good demonstration of low-level capabilities of C# and .NET comparable to C++ features, to work with memory directly through pointers.

Value Types

Before working with arrays, it is necessary to find a way to check equality of two arbitrary value type events, if the type doesn't implement IEquatable interface. To do that, we need to understand the nature of value type. The object of such type represents continuous memory block, regardless of its location - boxed, field, or local variable. From the machines perspective, there is no difference between fields of reference type or value type, inside of struct. Therefore, two blocks of memory can be compared in bitwise manner. Obviously, if all bits in two blocks of memory are equal, then two values are equal as well. Field of reference type is represented as GC trackable pointer, which has native size depending on the underlying platform. Therefore, it is possible to reinterpret a reference, to an object, as native-sized int, e.g. IntPtr. No need to pin it in the managed heap, because no modifications expected. However, false positives are possible, because the pointer can be modified by GC.

It remains to overcome the last limitation: the object of value type should not be moved, by GC during the collection phase. This effect can be achieved by the following ways:

Use pinned reference
Place value on the stack. Stack memory is not touched by GC

I decided to use the second approach for its simplicity. Now, we can define a shape of equality comparison method,

public static bool Equals<T, G>(T first, G second) where T : struct where G : struct

It should take into account trivial situations when T is a primitive type. In those cases, the method can use ceq IL instruction to perform a fast equality check. If size of T is more than IntPtr.Size(native-sized int), then a more intelligent comparison should be performed.

Arbitrary access to the raw memory of value type, in C#, can be done in two ways,

Unsafe class
IL inline assembly with InlineIL.Fody

In the original implementation, I use second approach. For the article, I prefer the first way, because I don't want to blow your mind! Now, we have all of the necessary tools for the implementation of Equals method.

public static bool Equals<T, G>(T first, G second)
where T : struct
where G : struct
=> Equals(ref first, ref second);
private unsafe static bool Equals<G>(ref T first, ref G second)
{
var size = Unsafe.SizeOf<T>();
if (size != Unsafe.SizeOf<G>())
return false;
switch(size)
{
default:
return EqualsAligned((IntPtr)Unsafe.AsPointer(ref first), (IntPtr)Unsafe.AsPointer(ref second), size);
case sizeof(byte):
return Unsafe.As<T, byte>(ref first) == Unsafe.As<G, byte>(ref second);
case sizeof(short):
return Unsafe.As<T, short>(ref first) == Unsafe.As<G, short>(ref second);
case sizeof(int):
return Unsafe.As<T, int>(ref first) == Unsafe.As<G, int>(ref second);
case sizeof(long):
return Unsafe.As<T, int>(ref first) == Unsafe.As<G, int>(ref second);
}
}

Let's sort it all out,

Line#4
Public method accepts comparison operands on the stack. Now, there is no need to worry about GC and the managed pointer (type T&, or ref T) can be obtained safely without pinning
Line #9
Type T and G should have identical size to perform further operations. If their size is not equal, then just return false.
Line #11
Process cases when size of value type is equal to the size of primitive types. In this case, the equality check can be performed using comparison operator, which is translated into ceq instruction. Unsafe.As method allows to reinterpret type of managed pointer
Line #14
The size of value type is larger than native sized int, call EqualsAligned method that performs optimized comparison of two blocks of memory

Unsafe.SizeOf method is just an alias to sizeof IL. Instruction always produces a constant value, so JIT compiler can replace the expression with it. After that, dead code elimination comes to play. Different branches of switch statement can be dropped by JIT compiler, because they are never reachable for particular types T and G. If these types have the size less than or equal to sizeof(long), then the body of the method will be replaced by ceq instruction and the entire method can be inlined. This approach gives us the same performance as standard equality operator ==. If value types have greater size, then only default branch is used.

EqualsAligned method, implements general algorithm of comparison. But, it should be fast. To achieve the best performance, we can utilize vectorization. Most of modern CPUs support extensions, to work with vectors, that are larger than general-purpose registers. .NET supports this extension via Vector<T> data type. Let's use it for our implementation,

private static bool EqualsAligned(IntPtr first, IntPtr second, long length)
{
var result = false;
if (Vector.IsHardwareAccelerated)
while (length >= Vector<byte>.Count)
if (ReadVector(ref first) == ReadVector(ref second))
length -= Vector<byte>.Count;
else
goto exit;
while (length >= sizeof(UIntPtr))
if (Read<UIntPtr>(ref first) == Read<UIntPtr>(ref second))
length -= sizeof(UIntPtr);
else
goto exit;
while (length > sizeof(byte))
if (Read<byte>(ref first) == Read<byte>(ref second))
length -= sizeof(byte);
else
goto exit;
result = true;
exit:
return result;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static Vector<byte> ReadVector(ref IntPtr source)
{
var result = Unsafe.Read<Vector<byte>>(source.ToPointer());
source += Vector<byte>.Count;
return result;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static T Read<T>(ref IntPtr source) where T : unmanaged
{
var result = Unsafe.Read<T>(source.ToPointer());
source += sizeof(T);
return result;
}

Here I'm trying to compare memory chunks sequentially, as much as possible, using Vector<T> data type. ReadVector is here because C# 7.x doesn't support constructed unmanaged types. Starting from C# 8, it is not an issue anymore and method can be removed.

If chunk of memory is less than the size of the vector, then use native-sized int. Trailing bytes are checked by simple comparison of bytes. As you can see, this method is not aware about the layout of fields inside of value type and interprets it as continuous array of bytes, through untyped pointer represented by IntPtr data type. Unsafe.Read gives aligned memory access, so the entire implementation is probably faster than SequenceEqual on platforms, where unaligned access is slow.

Arrays

Now, we have a tool in the form of EqualsAligned for fast comparison of two arrays. Both arrays should be pinned in memory to avoid re-allocation of array elements, caused by GC.

public static unsafe bool BitwiseEquals<T>(this T[] first, T[] second) where T : unmanaged
{
if (first is null || second is null)
return ReferenceEquals(first, second);
if (first.LongLength != second.LongLength)
return false;
if (first.LongLength == 0)
return true;
fixed (T* firstPtr = first, secondPtr = second)
return EqualsAligned(new IntPtr(firstPtr), new IntPtr(secondPtr), first.LongLength * sizeof(T));
}

The main benefit of such an approach is an ability to compare more than one array element, at a time. If size of T is less than vector size, then the algorithm packs as many as possible, elements into single vector and perform hardware accelerated equality check.

This implementation supports a large array when index overflows int data type. This is not possible with Span<T> because it holds the length of the array as int.

Default Value

Another interesting case, is to determine whether the value of particular type T is default(T), i.e. initialized by default. This API is proposed for future versions, but not yet implemented. Our new method Equals<T, G>, allows to do that as follows:

public static bool IsDefault<T>(T value) where T : struct
=> Equals<T, T>(value, default(T));

It is not efficient because objects of type T are copied multiple times, through the stack. Moreover, default(T) is redundant, because default value is always a block of memory filled with zeroes, that have the size equal to sizeof(T). So, it is enough to ensure that the memory occupied by value of type T is filled with zeroes. Another inconvenience is that T should be value type. But, we want to have a universal method that works for reference and value types. For reference types, the method should perform trivial null check.

The task seems like simpler than comparison of two value types. But, here is a problem with reference types, there is no way to obtain a pointer to the managed reference. However, different types of pointers are indistinguishable at IL level. Managed reference (type O), managed pointers (type T&) and unmanaged pointers (type T*) are mutually convertible. Unsafe class can help us to obtain a pointer to the managed reference.

private unsafe static bool IsDefault<T>(T value)
{
var ptr = Unsafe.AsPointer(ref value);
var size = Unsafe.SizeOf<T>();
switch(size)
{
default:
return IsZeroAligned(new IntPtr(ptr), size);
case sizeof(byte):
return Unsafe.AsRef<byte>(ptr) == 0;
case sizeof(ushort):
return Unsafe.AsRef<ushort>(ptr) == 0;
case sizeof(uint):
return Unsafe.AsRef<uint>(ptr) == 0;
case sizeof(ulong):
return Unsafe.AsRef<ulong>(ptr) == 0UL;
}
}

If size of T is less than 64 bit, then the check can be replaced with trivial equality operation == (ceq in IL) where the second operation is 0. Therefore, the performance of the method when T is primitive or refernece type is comparable with 1-2 assembly instructions. IsZeroAligned method is called when size of T is greater than 64 bit. The implementation of this method is similar to EqualsAligned and use the same optimization techniques,

private static bool IsZeroAligned(IntPtr address, long length)
{
var result = false;
if (Vector.IsHardwareAccelerated)
while (length >= Vector<byte>.Count)
if (ReadVector(ref address) == Vector<byte>.Zero)
length -= Vector<byte>.Count;
else
goto exit;
while (length >= sizeof(UIntPtr))
if (Read<UIntPtr>(ref address) == UIntPtr.Zero)
length -= sizeof(UIntPtr);
else
goto exit;
while (length > sizeof(byte))
if (Read<byte>(ref address) == 0)
length -= sizeof(byte);
else
goto exit;
result = true;
exit:
return result;
}

All comparisons in this method are performed between value dereferenced from the memory block and zero value. .NET provides default vector filled with zeroes, so we can use it to compare two blocks of memory, with maximum possible size.

Conclusion

Is it really faster in comparison with alternative methods? Yes, it is 5-6 times faster than SequenceEqual and 10 times faster than for loop. Detailed table with benchmark results is here and here. On .NET Core 3.x and x86-64 platform SequenceEqual from .NET and Equals from our implementation have the equal performance.

It was a good demonstration of low-level capabilities provided by .NET platform. However, I'm not recommended to copy-paste this code into regular application. Instead, use libraries to get these features:

If you're using .NET Core 3.x, support of different generics types is not required, unaligned memory access is not an issue and absence of IsDefault method is OK for you then use MemoryExtensions class
If you need all the features without limitation from .NET Core side or they should be compatible with .NET Standard 2.0 runtimes (.NET Core 2.x, .NET Framework 4.x) then use BitwiseComparer<T> and Intrinsics classes from .NEXT library.

The fragments of implementation were taken from here, here and here.