Playing With Strings And Bytes/Byte - Arrays C#

Jin Vincent Necesario
Aug 26, 2019

31.5k
0
9
- facebook
- twitter
- linkedIn
- Reddit
- WhatsApp
- Email
- Print
- Other Artcile

Introduction

Most developers I’ve encountered when playing with strings and/or bytes are always interested to see what is really inside the physical device. They always imagine, “What’s really inside?” Even I sometimes ask myself, “Do we really need to see the ones and zeroes?”

Basically, you can play with bytes, bits and strings and see the represented byte-string. In this article will try to explore the different ways to manipulate byte-arrays to string and string to byte-arrays. Lastly, we are going to tackle a bit about the “Encodings”, and focus more on the methods such as “GetyBytes”,“GetByteCount” and “BitConverter”.

Background

So before we play with strings and bytes. I want to introduce you a summary basic concept of ASCII and Unicode. Here are some lists to take note of,

ASCII (American Standard Code for Information Interchange) and Unicode are used for communication, wherein a computer can possibly transfer data from one computer to another.
ASCII uses 7 bits to represent a character and have been extended to 8 bits “extended ASCII” which solves the Latin alphabet while Unicode represents more languages in the world. That’s wherein the “Unicode Encoding” comes into play because most characters don’t fit into the 8-bits size.
Unicode encodings come into play because we need numerous ways to store a character in a byte sequence. See the different type of Encodings: UTF8, UTF-16, and UTF-32.

For more information about the difference of ASCII and Unicode, please see my LinkedIn article.

.NET Encoding

Before we start with the examples, I would like to introduce you to the “System.Text.Encoding”. It is basically an abstract class which is intended to represent a character encoding. It also provides methods to convert arrays to strings of Unicode characters to and from arrays of bytes. Here are the list of the derived classes,

ASCIIEncoding
UTF7Encoding
UTF8Encoding
UnicodeEncoding
UTF32Encoding

See the figure 1 below to visualize the hierarchy of the inheritance.

Playing With Strings And Bytes/Byte - Arrays C#

Figure 1

If you want to programmatically get the derived encoding classes, see the sample code below,

[TestMethod]
public void Test_Types_Of_Derived_Encoding_Classes()
{
var type = typeof(Encoding);
var assembly = Assembly.GetAssembly(type);
var types = assembly.GetTypes();
var derivedClasses =
types.Where(t => t.IsSubclassOf(type) && t.IsPublic == true).ToList();
foreach (var @class in derivedClasses)
{
Console.WriteLine(@class.Name);
}
}

Just some notes to keep in mind, when using those properties to use different encodings from the abstract class “System.Text.Encoding” such as ASCII, UTF8, UTF7, etc. It actually creates a new instance of that derived class. Please see the code below,

Note

The code is based from here.

Therefore you can create a new instance of a certain encoding type or you can use the abstract class and choose a property-encoding type specific for your needs.

var utf8 = new UTF8Encoding(); //you can use this
var utf8_2 = Encoding.UTF8; //or you can use this

Let us try to see if the concept is true, see the code example below:

[TestMethod]
public void Test_If_Encodings_Are_Same_Type() {
Assert.IsInstanceOfType(Encoding.ASCII, typeof(ASCIIEncoding)); //true
Assert.IsInstanceOfType(Encoding.UTF7, typeof(UTF7Encoding)); //true
Assert.IsInstanceOfType(Encoding.UTF8, typeof(UTF8Encoding)); //true
Assert.IsInstanceOfType(Encoding.Unicode, typeof(UnicodeEncoding));
//true
Assert.IsInstanceOfType(Encoding.UTF32, typeof(UTF32Encoding)); //true
}

String to Byte Array

In order to convert string to byte array you need a specific Encoding, then use the “GetBytes” method. As it converts a string into byte array let us also see the character and its equivalent numerical ASCII/Unicode value. Just a note using the ASCII-encoding uses 7 bits while UTF8-encoding uses 8 bits to represent a character.

See the example below,

string strRandomWords = "I Love C#";
[TestMethod]
public void Test_ASCII_Using_GetBytes()
{
//converts a string into byte array
var byteResults = Encoding.ASCII.GetBytes(this.strRandomWords);
Assert.IsTrue(byteResults.Length > 0); //true
#region iterate
foreach (var @byte in byteResults)
{
string fullResultInString =
string.Format("Character: {0} in ASCII {1}",
(char)@byte, @byte) ;
Console.WriteLine(fullResultInString);
}
#endregion
}

Now if you are interested also to get the byte size you then can use “GetByteCount” method. In our example we get the number of bytes depending on the encoding type. I decided to double check if the expected bits are correct. See the two examples below,

string strRandomWords = "I Love C#";
[TestMethod]
public void Test_ASCII_Using_GetByteCount() {
//converts a string into byte array
var byteResults = Encoding.ASCII.GetBytes(this.strRandomWords);
//get the byte count
int byteCount = Encoding.ASCII.GetByteCount(this.strRandomWords);
int totalBits = 0;
for (int counter = 0; counter < byteResults.Length; counter++) {
string bits = Convert.ToString(byteResults[counter], 2);
totalBits = bits.Length + totalBits;
}
//let’s check if they are equal. 7 is used because ASCII uses 7 bits
Assert.AreEqual(byteCount, Math.Ceiling((totalBits / 7.00)));
}
//use non Latin alphabet character to test UTF-8
string strRandomNonEnglishStrings = "プログラミングが大好き";
[TestMethod]
public void Test_UTF8_Encoding_Using_GetByteCount() {
//converts a string into byte array
var byteResults = Encoding.UTF8.GetBytes(this.strRandomNonEnglishStrings);
//get the byte count
int byteCount = Encoding.UTF8.GetByteCount(this.strRandomNonEnglishStrings);
int totalBits = 0;
for (int counter = 0; counter < byteResults.Length; counter++) {
string bits = Convert.ToString(byteResults[counter], 2);
totalBits = bits.Length + totalBits;
}
//let’s check if they are equal. 8 is used because UTF8 uses 8 bits
Assert.AreEqual(byteCount, Math.Ceiling((totalBits / 8.00)));
}

Byte-Array to String

From the previous examples, this shows how we can get the byte-array from a string. Now, we can try to see, how those byte-arrays are represented as a string and see what does the byte-array actually represent in a human readable format.

To see a series of bytes we can then use “BitConverter,” a helper class which helps developers to convert data-types to array-types and array of bytes to base data types. Let us see some examples below,

string strRandomWords = "I Love C#";
[TestMethod]
public void Test_Convert_String_To_Bytes_Formatted() {
var bytes = Encoding.UTF8.GetBytes(strRandomWords);
Assert.IsNotNull(bytes);
//converts a byte array into a series of byte-strings
var seriesOfByteStrings = BitConverter.ToString(bytes);
Assert.IsTrue(!string.IsNullOrWhiteSpace(seriesOfByteStrings));
Console.WriteLine(seriesOfByteStrings); //49-20-4C-6F-76-65-20-43-23
}
[TestMethod]
public void Test_Convert_To_Bytes_Formatted_Using_Other_Value_Types() {
int bday = 03291982;
var result = BitConverter.GetBytes(bday);
Assert.IsNotNull(result);
var seriesOfBytesStrings = BitConverter.ToString(result);
Assert.IsTrue(!string.IsNullOrWhiteSpace(seriesOfBytesStrings));
Console.WriteLine(seriesOfBytesStrings); //4E-3B-32-00
}

Lastly, we can use “GetString” to get the exact human readable format. Let us see the example below:

string strRandomWords = "I Love C#";
[TestMethod]
public void Test_ASCII_Using_Get_String()
{
var byteResults = Encoding.ASCII.GetBytes(this.strRandomWords); //converts a string into byte array
Assert.IsTrue(byteResults.Length > 0); //true
string humanReadableString =
Encoding.ASCII.GetString(byteResults, 0, byteResults.Length);
Assert.AreEqual(humanReadableString, strRandomWords);
}

Summary

In this article, we have explored a brief concept of ASCII & Unicode. We have also seen that “System.Text.Encoding” does have derived classes such as ASCIIEncoding, UTF7Encoding, UTF8Encoding, UnicodeEncoding & UTF32Encoding. Upon learning those derived classes you may choose to use the Encoding.[Encoding-Type] e.g. Encoding.ASCII or creating a new instance e.g. var asci = new ASCIIEncoding().After that we have focused on the conversion of String to Byte array and vice versa.

By the way, most the source code samples are also available on GitHub. I really did enjoy creating this article, I’m hoping you felt the same way too as you read it. Until next time, happy programming.

Recommended Free Ebook

Working with Directories in C#

Download Now!