UTF-8 String Literals In C# 11 | Learn C#

Introduction

In this article, we gonna discuss the new feature UTF-8 string literals in C# 11 with examples. C# programming language was launched in 2000 and now it is at its version 11. As part of this article, we are going to discuss the following pointer. Previous article have provided some of the features of c# 11. You can get them from the following,

UTF-8 string literals

UTF-8 is an important language for the web and is widely used in the .NET stack. Many pieces of data come in the form of byte[] from the network stack, but there are also significant uses of constants in the code. For example, the networking stack frequently writes constants such as "HTTP/1.0\r\n", "AUTH", and "Content-Length:". Because of this, the use of UTF-8 string literals is necessary in many cases.

U8 suffix on string literals

To define UTF-8 character encoding, we can add the u8 suffix to a string literal. This functionality makes it easier to create UTF-8 strings when our program need them for HTTP string constants or other text protocols.

When the u8 suffix is used, the resulting value is a ReadOnlySpan<byte> that contains a UTF-8 representation of the string as a sequence of bytes. A null terminator is placed outside of the length of the ReadOnlySpan<byte> to handle certain interop scenarios where null-terminated strings are expected. This allows the resulting string to be used in these scenarios without requiring additional processing.

In.NET, strings are encoded using UTF-16. The standard for Web protocols and several significant libraries is UTF-8. A string literal can have the u8 suffix added to it to signify UTF-8 encoding as of C# version 11. The storage format for UTF-8 literals is ReadOnlySpanbyte> objects. A UTF-8 string literal's default type is ReadOnlySpanbyte>. Instead of expressing the equivalent System, use a UTF-8 string literal to provide a more understandable definition. ReadOnlySpanT>, as seen in the code below,

ReadOnlySpan<byte> text = "Rama Sagar"u8;
ReadOnlySpan<byte> u16A = Encoding.Unicode.GetBytes("A");
ReadOnlySpan<byte> u8A = "A"u8;

To store a UTF-8 string literal as an array requires the use of ReadOnlySpan<T>.ToArray() to copy the bytes containing the literal to the mutable array.

ReadOnlySpan<byte> u8Span = new byte[] { 55, 56, 57 };

UTF-8 string literals cannot be used as the default value for an optional parameter because they are runtime constants, not compile-time constants. Additionally, we cannot use string interpolation with UTF-8 string literals because they cannot be combined with the $ token and the u8 suffix in the same string expression.

It is important to note that the ReadOnlySpan<byte> or byte[] types are enforced at compile time, but UTF-8 strings are not. This means that UTF-8 strings cannot be used as default parameters in functions, and attempting to do so will result in a compilation error.

Conclusion

It is worth noting that UTF-8 string literals are primarily used in web scenarios and may not be used frequently. However, if we do need to use this feature, this post should provide useful information.


Similar Articles