Do you know string Interning? Let's learn it

In this post, we will learn about string interning

Strings are one of the first things that a developer learns about any programming languages. The same thing also applies with C# as well. Although many of us feel that we understand the strings well and make best use of it but sometimes this is not the case. Today I am going to talks one more concepts that is called string interning. I am sure some of you not aware, even I was also not aware few months back. It won’t affect you directly but if you learn it, it will surely make you better developer.

So let's start from the very basic. If we write the below code

Note - One thing you all must be knowing that string objects are immutable. It means the same object cannot be modified and if any operation is being done on a string then a new string is created.

Let’s discuss various scenarios.

Scenario 1:
  1. string s1 = "Hi";  
  2. string s2 = "Hi"
 Are the above two same strings?
 Obviously Yes   

But are they same object or both variables s1 and s2 pointing to same memory location?

Confused - But answer is Yes

Let’s just check it.

  1. Console.WriteLine(Object.ReferenceEquals(s1, s2)); 

When we execute the above line of code. It displays True. What is the above line actually doing? It checks whether the passed objects are pointing to same memory location or pointing to same object. If points to same object then returns true else false. Pictorially it can be depicted as

We'll discuss it later in the post that How does C# handle it.
Let's take another scenario.

Scenario 2:

  1. string s1 = "Hi Ram";  
  2. string s2 = "Hi " + "Ram";  
  4. Console.WriteLine(Object.ReferenceEquals(s1, s2)); 

Now what will it print?

True or false?

Confused again. It will print True again. Because it again points to the same object. We'll discuss the reason later in the post.

Let’s take another scenario 
Scenario 3:
  1. string p = "Ram";  
  2. string s1 = "Hi Ram";  
  3. string s2 = "Hi " + p;  
  5. Console.WriteLine(Object.ReferenceEquals(s1, s2)); 
 What will be the output of the above code?

True? Wrong

Now it’ll print False. In the above code there will be three objects created in memory.

Let's take one more scenario
Scenario 4 :
  1. string s1 = "Hi";  
  2. string s2 = string.Copy(s1);  
  4. Console.WriteLine(Object.ReferenceEquals(s1, s2)); 
 So here, the value of s1 and s2 is same but when we run the above code, it returns False.
Because now s1 and s2 are not pointing to same object. As we used string.copy for copying an object, it creates a new object in the heap. Pictorially, we can understand it as
Let’s understand that How does all the scenarios work?

In scenario 1, when the code get compiles, it sees that the value of the both variable are same as we have just provided the constant values to both variables. So it makes sure that both points to the same object. Even in scenario 2, we have done s2 = "Hi " + "Ram"; . It is also a constant expression and compiler get to know that value of the s1 and s2 are same after addition in s2.  But in scenario 3, we are adding two string values (one is constant and  other is string variable) and assigning to s2. Although the p is constant, but the expression itself ("Hi " + p;) is not constant which we assigned to s2. So it creates different object and returns false accordingly. The whole concept of having same object (scenario 1 and 2) is called string interning and compiler does it implicitly

In scenario 4 we specifically telling to create a copy of the object so it creates two objects in memory and that is expected.
C# also allows us to tell the compiler about string interning explicitly. As in scenario 3, both s1 and s2 have same value they are stored at different memory location. As per C# specification, it does not fit implicit string interning. As we discussed, it does provide us a option to tell it specifically. For this , if we change the code as 
  1. string p = "Ram";  
  2. string s1 = "Hi Ram";  
  3. string s2 = string.Intern("Hi " + p);  
  5. Console.WriteLine(Object.ReferenceEquals(s1, s2)); 

Now it will return true again. As we have specifically mentioned while assigning s2 and do that interning as the resultant string would be same as we have created earlier. So if we are using the same value many times and creating new variable with interning (implicit or explicit when required) we are using the same object in memory.This may improve the performance in certain scenarios

 Hope you have got better Idea on string Interning.