C# String Interning For Efficient String Comparison

In this article, we are going to cover String Interning - A very important feature of .NET Framework from the perspective of string comparison.

When it comes to string comparison, we must think of the performance in terms of memory and time. Sometimes, a lack of concepts and basic understanding leads to performance penalties.

I will cover the following topics to make it a one-stop article for string interning and its benefits and associated issues.

  • Introduction to String Interning
    • String intern at Compile time
    • String intern at Run time
  • Methods in String Interning
  • Performance Analysis with and without String interning (time and memory)
  • Issues with String Interning

Introduction to String Interning

If we have multiple instances of the same string literal in an assembly, Common Language Runtime (CLR) retains only one instance of that variable and frees up other memory allocations. Internally, CLR maintains one table known as "Intern Pool" which stores the single instance of all unique strings in the assembly.

String Interning at Compile Time

Code Example

  1. string myName = "Atul";  
  2. string YourName = "Atul";  
  3. string name1 = "A" + "t" + "u" + "l";  
  4. string name2 = "A" + "tul";  
  5.   
  6. Console.WriteLine(object.ReferenceEquals(myName, YourName));  
  7. Console.WriteLine(object.ReferenceEquals(myName, name1));  
  8. Console.WriteLine(object.ReferenceEquals(myName, name2));  

In this example, first of all, string variable myName is created. Then, YourName string variable is created; it refers to the same myName variable (though memory is allocated to this variable, but never used). On a similar line, name1 and name2 variables are created and referenced to the myName variable. That is approved by the output of the program as all of them are coming as true, means all variables are referring to the same memory location.

By default, on compile time, all unique strings are created once and their reference is returned to the new string variables having the same value. Please note that C# is a case-sensitive language so Atul and ATUL will be treated as different string literals.

Important Catch

For string interning, one string has to be created to be interned. This means that in the above example, YourName, name1, and name2 variables will be created but since their values were found in Intern Pool, so those memory references (used by YourName, name1, and name2 variables) will NOT be referenced anymore and will get cleaned in the next garbage collection run.

String Interning at Run Time

As we saw, on compile time, string interning runs by default but for runtime scenario, it has a separate story. If we run the following code example, we get all output as false.

  1. StringBuilder sb1 = new StringBuilder("A");  
  2. StringBuilder sb2 = new StringBuilder("Atul");  
  3. StringBuilder sb3 = new StringBuilder();  
  4.   
  5. string name3 = string.Format("{0}""Atul");  
  6. string name4 = string.Format($"{"Atul"}");  
  7.   
  8. string name5 = sb1.Append("t").Append("ul").ToString();  
  9. string name6 = sb2.ToString();  
  10. string name7 = sb3.Append("Atul").ToString();  
  11.   
  12. Console.WriteLine(object.ReferenceEquals(myName, name3));  
  13. Console.WriteLine(object.ReferenceEquals(myName, name4));  
  14. Console.WriteLine(object.ReferenceEquals(myName, name5));  
  15. Console.WriteLine(object.ReferenceEquals(myName, name6));  
  16. Console.WriteLine(object.ReferenceEquals(myName, name7));  

And here, we get into trouble. Still, each string literal has the same value but they are stored separately. If there are too many string variables with the same value, it can lead to huge memory issue.

To rescue us from this scenario, we can use string.Intern to get similar intern pool behavior at runtime. The code implementation is shown below.

  1. string name31 = string.Intern(string.Format("{0}""Atul"));  
  2. string name41 = string.Intern(string.Format($"{"Atul"}"));  
  3. string name51 = string.Intern(name5);  
  4. string name61 = string.Intern(sb2.ToString());  
  5. string name71 = string.Intern(sb3.ToString());  
  6.   
  7. Console.WriteLine(object.ReferenceEquals(myName, name31));  
  8. Console.WriteLine(object.ReferenceEquals(myName, name41));  
  9. Console.WriteLine(object.ReferenceEquals(myName, name51));  
  10. Console.WriteLine(object.ReferenceEquals(myName, name61));  
  11. Console.WriteLine(object.ReferenceEquals(myName, name71));  

And here, we get the output as all true. Amazed... ?? Yes, now all string variables a referenced from the same memory location where I had created by first variable myName and memory location used by name31, name41, name51, name61, and name71 will be freed up in Garbage Collection process.

IsInterned and Intern Methods

With in the string interning family, we get two methods, String.IsInterned and String.Intern.

string.IsInterned returns the string which it refers to after interning.

Caution - Do NOT get confused with the name of the method, it does NOT return boolean.

string.Intern method also returns the string which interned string it refers to.

The major difference between string.IsInterned and string.Intern is that the first one returns a null value if that string is not interned while the latter (string.Intern) creates a new entry in the intern pool and returns that reference.

Let us examine the code and verify the facts said above with output.

  1. Console.WriteLine("String Interne methods ...");  
  2. Console.WriteLine("IsInterned Static");  
  3. Console.WriteLine(string.IsInterned(YourName));  
  4. Console.WriteLine(string.IsInterned(name1));  
  5. Console.WriteLine(string.IsInterned(name2));  
  6. Console.WriteLine(string.IsInterned(name3));  
  7.   
  8. Console.WriteLine("IsInterned Dynamic");  
  9. Console.WriteLine(string.IsInterned(name3));  
  10. Console.WriteLine(string.IsInterned(name4));  
  11. Console.WriteLine(string.IsInterned(name5));  
  12. Console.WriteLine(string.IsInterned(name6));  
  13. Console.WriteLine(string.IsInterned(name7));  
  14.   
  15. Console.WriteLine("string.IsInterned");  
  16. Console.WriteLine(string.IsInterned(name31));  
  17. Console.WriteLine(string.IsInterned(name41));  
  18. Console.WriteLine(string.IsInterned(name51));  
  19. Console.WriteLine(string.IsInterned(name61));  
  20. Console.WriteLine(string.IsInterned(name1 + "Sharma"));  
  21. Console.WriteLine(string.IsInterned(name71));  
  22.   
  23. Console.WriteLine("string.Intern");  
  24. Console.WriteLine(string.Intern(name31));  
  25. Console.WriteLine(string.Intern(name41));  
  26. Console.WriteLine(string.Intern(name51));  
  27. Console.WriteLine(string.Intern(name61));  
  28. Console.WriteLine(string.Intern(name71));  
  29. Console.WriteLine(string.Intern(name1 + "Sharma"));  

And here, we get the output.

C# String Interning For Efficient String Comparison 

Here, we see all strings are already interned so, string.IsInterned is returning the same string value as Atul but one as null for the code at Line # 20, i.e., Console.WriteLine(string.IsInterned(name1 + "Sharma"));

Since here, I am using run time (concatenating with name1 + "Sharma") string and it is not in the intern pool, it is returning NULL. Had I used it as "Sharma" (as hard-coded, so making it compile time variable), it would have created one entry in the intern pool and returned "Sharma" from intern pool. So, this proves our statement that "if string.Interned is passing any un-interned (new string literal) string, then it will return null".

But on the other side, when I had used code at line # 29, i.e., Console.WriteLine(string.Intern(name1 + "Sharma")); See the output in last line. Even if it is not in intern pool, it will create a new entry and return that reference value.

So, now we have understood the clear difference between both methods.

Performance Comparison with and without String Interning

To evaluate the scenario, I am going to compare two string values with and without string interning and then will discuss the time and memory used in both scenarios. I will be doing that for 100 Million times. For that purpose, I have this code -

  1. static void CompareWithStringIntern()  
  2.         {  
  3.             Console.WriteLine("CompareWithStringIntern()");  
  4.             string source = "Atul";  
  5.             string target = string.Intern(string.Format($"{"Atul"}"));  
  6.             bool isEqual = false;  
  7.             Stopwatch sw = new Stopwatch();  
  8.             sw.Start();  
  9.             for (int i = 0; i < 100000000; i++)  
  10.             {  
  11.                 if (source == target)  
  12.                     isEqual = true;  
  13.                 else  
  14.                     isEqual = false;  
  15.             }  
  16.             sw.Stop();  
  17.             Console.WriteLine($"Time - {sw.ElapsedTicks}");  
  18.             Console.WriteLine($"Memory - {GC.GetTotalMemory(true)}");  
  19.         }  
  20.   
  21.         static void CompareWithoutStringIntern()  
  22.         {  
  23.             Console.WriteLine("CompareWithoutStringIntern()");  
  24.             string source = "Atul";  
  25.             string target = string.Format($"{"Atul"}");  
  26.             bool isEqual = false;  
  27.             Stopwatch sw = new Stopwatch();  
  28.             sw.Start();  
  29.             for (int i = 0; i < 100000000; i++)  
  30.             {  
  31.                 if (source == target)  
  32.                     isEqual = true;  
  33.                 else  
  34.                     isEqual = false;  
  35.             }  
  36.   
  37.             sw.Stop();  
  38.             Console.WriteLine($"Time - {sw.ElapsedTicks}");  
  39.             Console.WriteLine($"Memory - {GC.GetTotalMemory(true)}");  
  40.   
  41.         }  

And, here is the output.

C# String Interning For Efficient String Comparison 

Performance evaluation of String Comparison With and Without string interning

Conclusion

And here, it is evident from the output that in comparison execution time, we get a huge boost (more than 2 times).

While in memory, we see the difference as with Intern, it is consuming slightly more memory because all memories are in single scope and garbage collection will run once. In an actual real-life scenario, unused memory (created while instantiating) will be cleaned up in several GC executions and will go through all generations.

Hence, it is good practice to use string interning for string comparison.

Issues with String Interning

As we saw here, the scope of the Intern pool is assembly, domain so they may not get cleaned during Garbage Collection and CLR will have to clear that.

The second issue refers to the fact mentioned in important catch; i.e. memory, will still be allocated to new string but will be abandoned and will be available for garbage collection as soon as it gets one matching entry in Intern Pool.

With this, I hope that I could explain the concept clearly and it doesn't leave any confusion. Should you have any doubts, please write and the source code used in this article is available here for experiment

The article is originally published at taagung.

Want to learn more about strings, try Strings in C# Tutorials

Reference

https://docs.microsoft.com/en-us/dotnet/api/system.string.intern?view=netframework-4.7.2