Story Of Equality In .NET - Part One

Background

 Few months back i followed a very interesting course on Pluralsight by Simon Robinson about Equality and Comparison in .Net which is indeed a great course that would help developers to understand about How Equality works in .Net , so i thought to share the insights of what i have learned from the course , i hope this will help others as well in understanding in depth how beautifully .Net handles the equality.
 
Introduction

The purpose of this post is to outline and explore some of the issues that make performing equality much more complex than you might expect. We will examine the difference between the value and reference equality and why equality and inheritance do not work well together. So let’s get started.

We will start from a simple example that will compare two numbers. For instance, let’s say that three is less than four; conceptually it is trivial and the code for this is also very easy and simple, as shown below:
  1. if(3 < 4)  
  2. {  
  3.   
  4. }  
If you look at the System.Object class from which all other types inherit, you will find the following four methods which check for: 
  1. static Equals()
  2. virtual Equals()
  3. static ReferenceEquals()
  4. virtual GetHashCode()
In addition to this, Microsoft has provided nine different interfaces for performing equality or comparison of types,

Most of these methods and interfaces come with a risk that if you override their implementation incorrectly, it will cause bugs in your code to take place and will also break the existing collections provided by the framework, which depends on them.

We will see what is the purpose of these methods and interfaces and how to use them correctly. We will also focus on how to provide custom implementation for equality and comparisons in the right way, which will perform efficiently, follow best practices, and most importantly not break other types of implementation.

Equality is Complex/Difficult

There are four reasons that make equality more complex than you might expect and these are as follows:

  1. Reference v/s Value Equality
  2. Multiple ways to Compare Values
  3. Accuracy
  4. Conflict with OOP

Reference V/S Value Equality

There is an issue of reference versus value equality. It’s possible to treat equality either way and unfortunately C# is not designed in a way so that it can distinguish between the two of these and that can cause the unexpected behavior sometimes if you don’t understand how these various operators and method work.

As you know, in C#, reference types does not contain an actual value, as it contains a pointer to a location in the memory that actually holds those values, which means that for the reference types, there are two possible ways to measure the equality.

You can say that both the variables refer to the same location in the memory, which is called reference equality, and is known as an identity, or you can say that the location to which both variables are pointing contains the same value, even if they are different locations, which is called Value Equality.

We can illustrate using the following example:

  1. class Program  
  2. {  
  3.    
  4.     static void Main(String[] args)  
  5.     {  
  6.         Person p1 = new Person();  
  7.         p1.Name = "Ehsan Sajjad";  
  8.    
  9.         Person p2 = new Person();  
  10.         p2.Name = "Ehsan Sajjad";  
  11.    
  12.         Console.WriteLine(p1 == p2);  
  13.         Console.ReadKey();  
  14.     }       
  15.    
  16. }  
As you can see in the above example, we have instantiated two objects of Person class and both contain the same value for name property. Clearly the above two instances of Person class are identical as they contain the same values, but are they really equal? When we check the equality of both the instances, using C# equality operator and running the example code, it prints out on the console False as an output, which means that they are not equal.
,
output

It is because for Person class both C# and the .NET framework consider the equality to be the Reference Equality. In other words, the Equality operator checks whether these two variable refer to the same location in the memory. Hence, in this example, they are not equal since both the instances of the Person class are identical, but they are separate instances; the variables p1 and p2 and both refer to different locations in the memory.

Reference Equality is very quick to perform, because you only need to check for one thing; whether the two variable hold the same memory address, while comparing values can be a lot slower.

For example, if a Person class holds several fields and the properties, instead of just one, and if you wanted to check if the two instances of the Person class have the same values, you would have to check every field/property, as there is no operator in C# which would check the value equality of two Person class instances, which is reasonable, because comparing whether the two instances of Person class contain exactly the same values is not the sort of thing that you would normally want to do. 

Now take this code as example,
  1. class Program  
  2. {  
  3.    
  4.     static void Main(String[] args)  
  5.     {  
  6.         string s1 = "Ehsan Sajjad";  
  7.    
  8.         string s2 = string.Copy(s1);  
  9.    
  10.         Console.WriteLine(s1 == s2);  
  11.         Console.ReadKey();  
  12.      }       
  13.    
  14.  }  
The code shown above is quite similar to the previous example code, but in this case, we are applyingthe equality operator onto identical strings. We instantiated a string and stored its reference in a variable named s1, followed by creating a copy of its value and holding that in another variable, s2. Now, if we run this code, we will see that according to the output, we can say that both the strings are equal.

output

If the equality operator had been checking for reference equality, we would have seen false printed on the console for this program, but for strings == operator evaluates equality of values of the operands.

Microsoft has implemented it like that, because checking whether one string contains another string is something a programmer would very often need to do.

Reference and Value Types

The reference and value issue only exists for reference types. For unboxed value types such as integer, float etc. the variable directly contains the value, there are no references, which means that equality only means to compare values.

The code given below compares the two integers, which will evaluate whether both are equal, as the equality operator will compare the values  that are stored in the variables.
  1. class Program  
  2. {  
  3.    
  4.     static void Main(String[] args)  
  5.     {  
  6.         int num1 = 2;  
  7.    
  8.         int num2 = 2;  
  9.    
  10.         Console.WriteLine(num1 == num2);  
  11.    
  12.         Console.ReadKey();  
  13.     }       
  14.    
Hence, in the code shown above, the equality operator compares the value stored in variable num1 with the value stored in num2.

However, if we modify this code and cast both the variables to the object, as we did in the following lines of code:
  1. int num1 = 2;  
  2.   
  3. int num2 = 2;  
  4.   
  5. Console.WriteLine((object)num1 == (object)num2);  
Now, if we run the code, you will see that the result is contradictory. In the result we got from the first version of the code, which is the second version of the code, the comparison returns false, that happened because the object is a reference type, so when we cast an integer to the object, it ends up boxed into the object as a reference, which means the second code compares the references, not the values, and it returns false, because both the integers are boxed into different reference instances.

This is something that a lot of developers don’t expect. Normally, we don’t cast value types to the object, but there is another common scenario that we often see which is if we need to cast value type in to an interface.
  1. Console.WriteLine((IComparable<int>)num1 == (IComparable<int>)num2);  
For illustrating what we said above, let’s modify the example code to cast the integer variables to ICompareable<int>. This is an interface provided by .NET framework in which integer type inherits or implements.

In .NET, interfaces are always reference types, so the above line of code involves boxing too and if we run this code, we will find that this equality check also returns false and it’s because this is again checking for reference equality.

Hence, you need to be careful when casting the values types to the interfaces, as it will always result in reference equality, if you perform an equality check.
 
== Operator 
 
All this code would probably not have been a problem, if C# had different operators for the value-types and reference types equality, but it does not, which some developers think is a problem. C# has just one equality operator and there is no obvious way to tell upfront what the operator is actually going to do for a given type.

For instance, consider this line of code,
  1. Console.WriteLine(var1 == var2)  
We cannot tell what the equality operator will do in the above code, because you just have to know what equality operator does for a type, there is no way around it, that’s how C# is designed.

In this post, we will go through what an equality operator deso and how it works under the hood in detail. Hence, after reading the complete post, I hope you will have a better understanding than the other developers of what actually happens when you write an equality check condition and you will be able to tell how an equality between the two objects is evaluated and will be able to answer correctly, whenever you come across the code, where the two objects are being compared for the equality.

Different Ways to Compare Values

Another issue that exists in the complexity of equality is, there is more than one way to compare the values of a given type. String type is the best example for this. Suppose we have two string variables, which contain the same value in them, as shown below:
  1. string s1 = "Equality";  
  2.   
  3. string s2 = " Equality";  
Now, if you compare both s1 and s2, we should expect that the result would be true for the equality check. It means that we should consider these two variables to be equal.

I am sure if you are looking at them as both string variables, which contain exactly the same values, then it makes sense to consider them equal and indeed that is what C# does, but what if I change the case of one of them to make them different, as shown below:
  1. string s1 = "EQUALITY";  
  2.   
  3. string s2 = "equality";  
Now should these two strings to be considered equal? In C#, the equality operator will evaluate to false, by saying that the two strings are not equal, but if we are not asking about C# equality operator, but about the principle, we should consider those two strings as equal then we cannot really answer, as it completely depends on the context, whether we should consider or ignore the case. Let’s say, I have a database of food items and we are querying a food item to be searched from the database, then the changes are the ones for which we want to ignore the case and treat both the strings equal, but if the user is typing in password for logging into an Application and you have to check if the password entered by the user is correct, then you should not certainly consider that the lower case and title case strings need to be equal.

The equality operator for the strings in C# is always case sensitive, so you can’t use it for the comparison and ignore the case. If you want to ignore the case, you can do it, but you will have to call the special methods, which are defined in the String type. For example,
  1. string s1 = "EQUALITY";  
  2.   
  3. string s2 = "equality";  
  4.   
  5. if(s1.Equals(s2,StringComparison.OrdinalIgnoreCase))  
The above example will evaluate the statement as true as we are telling it to ignore the case when doing a comparison for equality between s1 and s2.

Now, I am sure that none of that will surprise you. Case sensitivity is an issue that almost everyone encounters when they do programming. From the above example, we can illustrate a wider point for an equality in general that equality is not absolute in programming, it is often context-sensitive (e.g. case-sensitivity of string).

One example of this is that the user is searching for an item on a shopping cart Web Application and the user types an item name with extra whitespace in it, but when we are comparing that with the items in our database, so should we consider the item in our database equal to the item entered bythe user with whitespace, normally we consider them equal and display that result to user as a result of searching, which again illustrates that equality is context sensitive.

Let’s take one more example, consider the following two database records,

 ID  Name  Price  LastUpdated
 3211  Cold Coffee 2$ 1 Jan 2015

 ID  Name  Price  LastUpdated
 3211  Cold Coffee  2.5$  2 Jan 2016
  
Are they equal? In one sense, its yes. Obviously, these are the same records. They refer to the same drink item and they have the same primary key, but a couple of column values are different, as it is clear that the second record's item is the data after the records were updated and the first one is before updating, so this illustrates another conceptual issue with equality which comes in to play when you are updating data. Do you care about the precise values of the record or do you care whether it is the same record -- and clearly there is no one right answer. Hence, once again it depends on the context of what you are trying to do!

Equality and Comparison

The way .NET deals with multiple meanings of the equality is quite neat. .NET allows each type to specify its own single natural way of measuring equality for this type. So, for example, String type defines its natural equality to be if the two strings contain an exact same sequence of the characters, viz-a-viz comparing the two strings with different case returns false as they contain a different character. This is because “equality” is not equal to “EQUALITY” as lower case and uppercase are the different characters.

It is very common that the types expose their natural way of determining equality by means of a generic interface called IEquatable<T>. String also implements this interface for the equality, but separately .NET also provides a mechanism for you to plug in a different implementation of the equality, if you don’t like the Type’s own definition or if that does not fulfill your needs.

This mechanism is based on Equality Comparers. An Equality Comparer is an object, whose purpose is to test whether the instances of a type are equal, using the definition provided by the Comparer to check the equality.

Equality Comparers implement an interface called IEqualityComparer<T>.  For example, if you want to compare the string ignoring the extra white spaces, you could write an Equity Comparer that knows, how to do that and then use that Equality Comparer instead of the equality operator, as required.

Things work basically the same way for doing ordering comparisons. The main difference is that you would use different interfaces. .Net also provides an interface to provide a mechanism for a type to do a less than or greater than comparison for a type which is known as ICompareable<T>, and separately you can write what are known as comparers which is IComparer<T>, this can be used to define an alternative implementation for comparison done for ordering, we will see how to implement these interfaces in some other post.

Equality for Floating Points

Some data types are inherently approximate. In .NET, you will encounter this problem with floating point types like float, double or decimal or any type that contains a floating point type as a member field. Let’s have a look at an example.
  1. float num1 = 2.000000f;  
  2. float num2 = 2.000001f;  
  3. Console.WriteLine(num1 == num2);  
We have two floating point numbers that are nearly equal. So are they equal? It looks pretty obvious, that they are not equal, as they differ in the final digit and we are printing the equality result on the console. Hence, when we run the code, the program displays true,

output

This program has come out, saying that both are equal, which is completely contradictory to what we have evaluated by looking at the numbers and you can probably guess what the problem is. The computers can only accept the numbers to a certain level of accuracy and the float type just cannot store enough significant digits to distinguish these two particular numbers and it can work the other way around too.
 
For example:
  1. float num1 = 1.05f;  
  2. float num2 = 0.95f;  
  3.   
  4. var sum = num1 + num2;  
  5.   
  6. Console.WriteLine(sum);  
  7. Console.WriteLine(sum == 2.0f);  
This is a simple calculation, where we are adding 1.05 to 0.95. It looks very obvious that when you add those two numbers you will get the answer 2.0, so we have written a small program for this, which adds these two numbers and then we check that the sum of the two numbers is equal to 2.0, if we run the program. The output contradicts what we had thought, which says the sum is not equal to 2.0 and the reason is that rounding errors happened in the floating point arithmetic resulting in the answer storing a number that is very close to 2, so close that string representation on Console.WriteLine even displayed it as 2 but it’s still not quite equal to 2.

output

Those rounding errors in floating point arithmetic have resulted in the program giving the opposite answer to what any common sense reasoning would tell you. Now, this is an inherent difficulty with floating point numbers. Rounding error means that testing for equality often gives you the wrong result and .NET has no solution for this. Don’t try to compare floating point numbers for equality because the results might not be what you predict. This only applies to the equality. This problem does not normally affect the less than and greater than comparisons, as in most cases, there are no problems in comparing the floating points number to see whether one is greater than or less than another, it’s equality that gives the problem.

Equality Conflicts with Object Oriented Principles

This one often comes as a surprise to the experienced developer as well, as there is in fact a fundamental conflict between equality comparisons, type safety, and good object oriented practices. These three things do not sit well together, this often makes it very hard to make equality right and bug free even once you resolve the other issues.

We will not talk much about this in detail, as it will be easy for you to understand once we start seriously coding which I will demonstrate in a separate post and you will be able to then see how the problem naturally arises in the code you write.

Now let’s just try and give you a rough idea of the conflict for now. Let’s say we have base class Animal which represents different animals and will have a derived class for example Dog which adds information specific to the Dog.
  1. public class Animal  
  2. {  
  3.    
  4. }  
  5.    
  6. public class Dog : Animal  
  7. {  
  8.    
  9. }  
If we wanted the Animal class to declare that Animal instances know how to check whether they are equal to other Animal instances, you might attempt to have it implement IEquatable<Animal>. This requires it to implement an Equals() method which takes an Animal instance as a parameter,
  1. public class Animal : IEquatable<Animal>  
  2. {  
  3.   
  4.     public virtual bool Equals(Animal other)  
  5.     {  
  6.         throw new NotImplementedException();  
  7.     }  
  8. }  
If we want Dog class to also declare that Dog instances know how to check whether they are equal to other Dog instances, we probably have implement IEquatable<Dog> that means it will also implement similar Equals() method which take Dog instance as parameter.
  1. public class Dog : Animal, IEquatable<Dog>  
  2. {  
  3.    
  4.     public virtual bool Equals(Dog other)  
  5.     {  
  6.         throw new NotImplementedException();  
  7.     }  
  8. }  
And this is where the problem comes in. You can probably guess that in a well-designed OOP code, you would expect the Dog class to override the Equals() method of Animal class, but the trouble is the Dog equals method has a different argument parameter than Animal Equals method which means it won’t override it and if you are not very careful that can cause sort of subtle bugs where you end up calling the wrong equals method and so returning the wrong result.

Often the only work around to this is to lose type-safety and that’s what you exactly see in the Object type Equals method which is the most basic way most types implement equality.
  1. class Object  
  2. {  
  3.    public virtual bool Equals(object obj)  
  4.    {  
  5.       
  6.    }  
  7.      
  8. }  
This method takes an instance of object type as parameter which means it is not type-safe, but it will work correctly with inheritance. This is a problem that is not well-known, there were a few blogs around this that gave incorrect advice on how to implement equality because they don’t take this issue into account, but it is a problem. We should be very careful how we design our code to avoid it.

Summary 
  • C# does not syntactically distinguish between value and reference equality which means it can sometimes be difficult to predict what the equality operator will do in particular situations.

  • There are often multiple different ways of legitimately comparing values. .Net addresses this by allowing types to specify their preferred natural way to compare for equality, also providing a mechanism to write equality comparers that allow you to place a default equality for each type.

  • It is not recommended to test floating point values for equality because rounding errors can make this unreliable.

  • There is an inherent conflict between implementing equality, type-safety and good Object Oriented practices.
 
You may also want to read the other parts written uptil now: