Using Span<T> To Improve Performance Of C# Code

Bohdan Stupak
1y
26.5k
0
11

Article

In my experience, the main thing to do in order to improve application performance is to reduce the number and duration of IO calls. However, once this option is exercised, another path developers will take is using memory on the stack. Stack allows very fast allocation and deallocation, although it should be used only for allocating small portions since stack size is pretty small. Also, using stack allows for reducing pressure on GC. In order to allocate memory on the stack, one uses value types or stackable operators combined with the usage of unmanaged memory.

The second option is rarely used by developers since API for unmanaged memory access is quite verbose.

Span<T> is a family of value types that arrived in C# 7.2, which is an allocation-free representation of memory from different sources. Span<T> allows developers to work with regions of contiguous memory in a more convenient fashion, ensuring memory and type safety.

Span implementation

Ref return

The first step in wrapping your head around Span<T> implementation for those who don’t closely follow updates in C# language is learning about ref returns, which were introduced in C# 7.0.

While most of the readers are familiar with passing method argument by reference, now C# allows returning a reference to a value instead of the value itself.

Let us examine how it works. We’ll create a simple wrapper around an array of prominent musicians that exhibits both traditional behavior and new ref return features.

public class ArtistsStore   
{   
    private readonly string[] _artists = new[] { "Amenra", "The Shadow Ring", "Hiroshi Yoshimura" };   
    public string ReturnSingleArtist()   
    {   
        return _artists[1];   
    }   
    public ref string ReturnSingleArtistByRef()   
    {   
        return ref _artists[1];   
    }   
    public string AllArtists => string.Join(", ", _artists);   
}

Now, let’s call those methods.

var store = new ArtistsStore();
var artist = store.ReturnSingleArtist();
artist = "Henry Cow";
var allArtists = store.AllAritsts; // Amenra, The Shadow Ring, Hiroshi Yoshimura

artist = store.ReturnSingleArtistByRef();
artist = "Frank Zappa";
allArtists = store.AllAritsts; // Amenra, The Shadow Ring, Hiroshi Yoshimura

ref var artistReference = ref store.ReturnSingleArtistByRef();
artistReference = "Valentyn Sylvestrov";
allArtists = store.AllAritsts; // Amenra, Valentyn Sylvestrov, Hiroshi Yoshimura

Observe that in the first and the second examples, the original collection is unmodified. In the final example, we’ve managed to alter the second artist of the collection. As you’ll see later during the course of the article, this useful feature will help us operate arrays located on the stack in a reference-like fashion.

Ref structs

As we know, value types might be allocated on a stack. Also, they don't necessarily depend on the context where the value is used. In order to make sure that the value is always allocated on the stack, the concept of ref struct was introduced in C# 7.0. Span<T> is a ref struct, so we are sure that it is always allocated on the stack.

Span implementation

Span<T> is a ref struct that contains a pointer to memory and the length of the span similar to below.

public readonly ref struct Span<T>  
{  
    private readonly ref T _pointer;  
    private readonly int _length;  
    public ref T this[int index] => ref _pointer + index;  
    // Additional members can be added here
}

Note the ref modifier near the pointer field. Such a construct can’t be declared in a plain C# in .NET Core. It is implemented via ByReference<T>.

So, as you can see, indexing is implemented via ref return, which allows reference-type-like behavior for stack-only structs.

Span limitations

To ensure that the ref struct is always used on the stack, it possesses a number of limitations; i.e., they can’t be boxed, they can’t be assigned to variables of type object, dynamic or to any interface type, they can’t be fields in a reference type, and they can’t be used across await and yield boundaries. In addition, calls to two methods, Equals and GetHashCode, throw a NotSupportedException. Span<T> is a ref struct.

Using Span instead of string

Reworking existing codebase.

Let’s examine code that converts Linux permissions to octal representation. You can access it here. Here is the original code

internal class SymbolicPermission
{
    private struct PermissionInfo
    {
        public int Value { get; set; }
        public char Symbol { get; set; }
    }
    private const int BlockCount = 3;
    private const int BlockLength = 3;
    private const char MissingPermissionSymbol = '-';
    private readonly static Dictionary<int, PermissionInfo> Permissions = new Dictionary<int, PermissionInfo>()
    {
        { 0, new PermissionInfo { Symbol = 'r', Value = 4 } },
        { 1, new PermissionInfo { Symbol = 'w', Value = 2 } },
        { 2, new PermissionInfo { Symbol = 'x', Value = 1 } }
    };
    private string _value;
    private SymbolicPermission(string value)
    {
        _value = value;
    }
    public static SymbolicPermission Parse(string input)
    {
        if (input.Length != BlockCount * BlockLength)
        {
            throw new ArgumentException("input should be a string 3 blocks of 3 characters each");
        }

        for (var i = 0; i < input.Length; i++)
        {
            TestCharForValidity(input, i);
        }
        return new SymbolicPermission(input);
    }
    public int GetOctalRepresentation()
    {
        var res = 0;
        for (var i = 0; i < BlockCount; i++)
        {
            var block = GetBlock(i);
            res += ConvertBlockToOctal(block) * (int)Math.Pow(10, BlockCount - i - 1);
        }
        return res;
    }
    private static void TestCharForValidity(string input, int position)
    {
        var index = position % BlockLength;
        var expectedPermission = Permissions[index];
        var symbolToTest = input[position];
        if (symbolToTest != expectedPermission.Symbol && symbolToTest != MissingPermissionSymbol)
        {
            throw new ArgumentException($"invalid input in position {position}");
        }
    }
    private string GetBlock(int blockNumber)
    {
        return _value.Substring(blockNumber * BlockLength, BlockLength);
    }
    private int ConvertBlockToOctal(string block)
    {
        var res = 0;
        foreach (var (index, permission) in Permissions)
        {
            var actualValue = block[index];
            if (actualValue == permission.Symbol)
            {
                res += permission.Value;
            }
        }
        return res;
    }
}
public static class SymbolicUtils
{
    public static int SymbolicToOctal(string input)
    {
        var permission = SymbolicPermission.Parse(input);
        return permission.GetOctalRepresentation();
    }
}

The reasoning is pretty straightforward: string is an array of chars, so why not allocate it on the stack instead of the heap?

So, our first goal is to mark the field _value of SymbolicPermission as ReadOnlySpan<char> instead of string. To achieve this, we must declare SymbolicPermission as a ref struct since a field or property cannot be of type Span<T> unless it’s an instance of a ref struct.

internal ref struct SymbolicPermission  
{  
    ...  
    private ReadOnlySpan<char> _value;  
}

Now, we just change every string within our reach to ReadOnlySpan<char>. The only point of interest is the GetBlock method since here we replace Substring with Slice.

private ReadOnlySpan<char> GetBlock(int blockNumber)  
{  
    return _value.Slice(blockNumber * BlockLength, BlockLength);  
}

Evaluation

Let’s measure the outcome.

Outcome

We notice the speed up, which accounts for 50 nanoseconds, which is about 10% of performance improvement. One can argue that 50 nanoseconds is not that much, but it costs almost nothing for us to achieve it!

Now we’re going to evaluate this improvement on permission, having 18 blocks of 12 characters each, to see whether we can gain significant improvements.

Improvements

As you can see, we’ve managed to gain 0.5 microseconds or 5% performance improvement. Again, it may look like a modest achievement. But remember that this was really low-hanging fruit.

Using Span instead of arrays

Let’s expand on arrays of other types. Consider the example from the ASP.NET Channels pipeline. The reasoning behind the code below is that data often arrives in chunks over the network, which means that the piece of data may reside in multiple buffers simultaneously. In the example, such data is parsed to int.

public unsafe static uint GetUInt32(this ReadableBuffer buffer)
{
    ReadOnlySpan<byte> textSpan;
    if (buffer.IsSingleSpan) // if data in single buffer, it’s easy
    {
        textSpan = buffer.First.Span;
    }
    else if (buffer.Length < 128) // else, consider temp buffer on stack
    {
        var data = stackalloc byte[128];
        var destination = new Span<byte>(data, 128);
        buffer.CopyTo(destination);
        textSpan = destination.Slice(0, buffer.Length);
    }
    else
    {
        // else pay the cost of allocating an array
        textSpan = new ReadOnlySpan<byte>(buffer.ToArray());
    }
    uint value;
    // yet the actual parsing routine is always the same and simple
    if (!Utf8Parser.TryParse(textSpan, out value))
    {
        throw new InvalidOperationException();
    }
    return value;
}

Let’s break it down a bit about what happens here. Our goal is to parse the sequence of bytes textSpan into uint.

if (!Utf8Parser.TryParse(textSpan, out value)) 
{  
    throw new InvalidOperationException();  
}  
return value;

Now, let’s have a look at how we populate our input parameter into textSpan. The input parameter is an instance of a buffer that can read a sequential series of bytes.

ReadableBuffer is inherited from ISequence<ReadOnlyMemory<byte>>, which basically means that it consists of multiple memory segments.

In case the buffer consists of a single segment, we just use the underlying Span from the first segment.

if (buffer.IsSingleSpan) 
{  
    textSpan = buffer.First.Span;  
}

Otherwise, we allocate data on the stack and create a Span<byte> based on it.

var data = stackalloc byte[128];  
var destination = new Span<byte>(data, 128);

Then, we use the method buffer.CopyTo(destination), which iterates over each memory segment of a buffer and copies it to a destination Span. After that, we just slice a Span of the buffer’s length.

textSpan = destination.Slice(0, buffer.Length);

This example shows us that the new Span<T> API allows us to work with memory manually allocated on a stack in a much more convenient fashion than prior to its arrival.

Conclusion

Span<T> provides a safe and easy-to-use alternative to stackalloc, which allows easy-to-get performance improvement. While the gain from each usage of it is relatively small, its consistent usage of it allows us to avoid what is known as a death by thousand cuts. Span<T> is widely used across the .NET Core 3.0 codebase, which allowed us to get a perfomance improvement comparing to the previous version.

Here are some things you might consider when you decide whether you should use Span<T>.

Suppose your method accepts an array of data and doesn’t change its size. If you don’t modify an input, you might consider ReadOnlySpan<T>.
If your method accepts a string to count some statistics or to perform a syntactical analysis, you should accept ReadOnlySpan<char>.
If your method returns a short array of data, you can return Span<T> with the help of Span<T> buf = stackalloc T[size]. Remember that T should be a value type.