Why String Is Immutable

Introduction

In this article, we are going to discuss a famous question - "Why is string immutable in C#?"

What is a string?

A string is a reference data type in C#. A string is a sequential collection of characters that is used to represent text. The value of the String object is the content of the sequential collection of System.Char objects, and that value is immutable (that is, it is read-only).

Why does CLR create a new object instead of modifying the existing object?

Before going to answer this question, first we need to know the data structure which is used to store the string in memory. As we know that the string is a collection of characters, Array data structure is used to store the string into memory in the form of character array.

Array

  • Array is a data structure which is used to store the collection of elements
  • Array size is defined on creation itself. We cannot modify the size of the array dynamically.

Creation of string

Now, we will try to understand what happens when CLR reads the below statement which is written to create a string. 

  1. string testString = "Siva"// 4 characters   

  1. First, CLR checks the type of the data type. It is a reference type which will be stored into heap memory.

  1. Now, CLR need to reserve some amount of memory to store the string value. So it checks the number of characters that exist in the string to; it is 4 (4x2 bytes = 8 bytes) characters.

  1. A character array will be created into a heap with the size of 8 bytes and “Siva” will be stored into memory. Let’s assume that the memory location of this is 101.

  1. CLR will assign 101 as a memory reference to testString.

Modification of string

Now, we will try to understand what happens when CLR reads the below statement which is written to modify the string. 

  1. testString= "SivaSankar"// 9 characters   

  1. Practically if CLR tries to fit “SivaSankar” into the same array it cannot store entire value because the array size which was created earlier is of 8 bytes (4 characters).

  1. But the new value “SivaSankar” is of 9 characters which cannot fit into 8 bytes array. Even if it fits, CLR still creates a new array because of other reasons mentioned below.

    • A new array will be created and “SivaSankar” will be stored into it. Let’s assume that the memory location of this is 202.

      Here, we are creating new object instead of modifying existing array which is making string as immutable.

    • CLR will remove the memory reference 101 from testString which will be collected by garbage collector later.

    • Now, CLR will assign 202 as a memory reference to testString.

    • A character array will be created into heap with the size of 8 bytes and “Siva” will be stored into memory.

Reason 1 - Array Data structure

Since array is used to store string values, CLR needs to create new array each and every time when string is changed due to array fixed size limitations.

Reason 2 - Security

Many parameters are represented as String in network connections, database connection, URLs, usernames/passwords etc. If string is immutable, these will be altered and may leads to serious issues.

Reason 3 - Synchronization and concurrency

Making String immutable automatically makes them thread safe thereby solving the synchronization issues.

Reason 4 - Caching

When Compiler optimizes your String objects, there are two objects having same value (x="Siva", and y="Siva") and you need only one string object (for both x and y, these two will point to the same object). We call this concept as string interning.

Reason 5 - Class loading

String is used as argument for class loading. If mutable, it could result in the wrong class being loaded.

That's it. I hope you liked it. Please share your valuable comments.