Implementing Equality in C#

Implementing equality can be tricky. If we follow a few simple rules we can avoid the common traps.

Let's say we have a person class that is functioning as an entity object as in the following:

  1. public class Person   
  2. {  
  3.     private readonly String _Identifier;  
  4.   
  5.     public Person(String identifier)   
  6.     {  
  7.         _Identifier = identifier;  
  8.     }  
  9.   
  10.     public String Identifier   
  11.     {  
  12.         get   
  13.         {  
  14.             return _Identifier;  
  15.         }  
  16.     }  
  17. }  
We probably want to override equality because equality is determined by an identifier for value objects. By default, the "Equals()" method just checks to see whether the reference of two objects point to the same memory location. In the code below, the equality check will return "false" even though the two entity objects are the same.
  1. Person firstPerson = new Person("123-45-6789");  
  2.   
  3. // a bunch of logic here   
  4.   
  5. Person secondPerson = new Person("123-45-6789");  
  6.   
  7. if (firstPerson.Equals(secondPerson)) // will evaluate to 'false'   
  8. {  
  9.     // do something important here   
  10. }  
The Surface Area

This is a common modeling problem. To address it, we will first implement the IEquatable interface.
  1. public class Person: IEquatable  
  2. {  
  3.     private readonly String _Identifier;  
  4.   
  5.     public Person(String identifier)  
  6.     {  
  7.         _Identifier = identifier;  
  8.     }  
  9.   
  10.     public String Identifier  
  11.     {  
  12.         get   
  13.         {  
  14.             return _Identifier;  
  15.         }  
  16.     }  
  17.   
  18.     public bool Equals(Person other)   
  19.     {  
  20.         throw new NotImplementedException();  
  21.     }  
  22. }  
Next we will also override the "Equals()" method on the base class "System.Object" on which all classes in C# inherit. If we override "Equals()" we should also override "GetHashCode()".
  1. public class Person: IEquatable   
  2. {  
  3.     private readonly String _Identifier;  
  4.   
  5.     public Person(String identifier)  
  6.     {  
  7.         _Identifier = identifier;  
  8.     }  
  9.   
  10.     public String Identifier  
  11.     {  
  12.         get   
  13.         {  
  14.             return _Identifier;  
  15.         }  
  16.     }  
  17.   
  18.     public bool Equals(Person other)   
  19.     {  
  20.         throw new NotImplementedException();  
  21.     }  
  22.   
  23.     public override bool Equals(object obj)  
  24.     {  
  25.         return base.Equals(obj);  
  26.     }  
  27.   
  28.     public override int GetHashCode()  
  29.     {  
  30.         return base.GetHashCode();  
  31.     }  
  32. }  
We also have the option to override the equality operator to enable syntax like the following:
  1. Person firstPerson = new Person("123-45-6789");  
  2.   
  3. // a bunch of logic here   
  4.   
  5. Person secondPerson = new Person("123-45-6789");  
  6.   
  7. if (firstPerson == secondPerson)  
  8. {  
  9.     // do something   
  10. }  
If we are overriding equality, we should be consistent in order to prevent defects being introduced from misunderstanding. It would be hard to make the case that the two ways to check for equality should behave differently. The argument could be made that the equality operator should not be implemented, but then it also checks for pointer references. If we have different behavior in the two ways to evaluate equality between instances, there is a high risk of defects being introduced by other programmers who will make assumptions.

Since assumptions will eventually be made either way, it is far less risky to err on the side of consistent behavior. If we override the "Equals()" method and do not implement the equality operator, false assumptions will lead to defects in the code base. If we override the "Equals()" method and also implement the equality operator, the impact of the false assumption that the equality operator is reference checking is much smaller. Besides, if we want to check for reference equality, we should explicitly use the "ReferenceEquals()" method.
  1. var test1 = firstPerson == secondPerson;  
  2. var test2 = firstPerson.Equals(secondPerson);  
  3. Here is the surface of what we will have to implement to implement the equality operator.  
  4.   
  5. public class Person: IEquatable   
  6. {  
  7.     private readonly String _Identifier;  
  8.   
  9.     public Person(String identifier)   
  10.     {  
  11.         _Identifier = identifier;  
  12.     }  
  13.   
  14.     public String Identifier   
  15.     {  
  16.         get  
  17.         {  
  18.             return _Identifier;  
  19.         }  
  20.     }  
  21.   
  22.     public bool Equals(Person other)  
  23.     {  
  24.         throw new NotImplementedException();  
  25.     }  
  26.   
  27.     public override bool Equals(object obj)  
  28.     {  
  29.         return base.Equals(obj);  
  30.     }  
  31.   
  32.     public override int GetHashCode()  
  33.     {  
  34.         return base.GetHashCode();  
  35.     }  
  36.   
  37.     public static Boolean operator == (Person first, Person second)   
  38.     {  
  39.         throw new NotImplementedException();  
  40.     }  
  41.   
  42.     public static Boolean operator != (Person first, Person second)  
  43.     {  
  44.         throw new NotImplementedException();  
  45.     }  
  46. }  
Trap #1 

The first problem we run into when implementing the equality operator "==" is that it is easy to inadvertently add infinite recursion.
  1. public static Boolean operator == (Person first, Person second) {  
  2.     if (first == null && second == null// infinite recursive loop here   
  3.     {  
  4.         return true;  
  5.     }  
  6.  
The "first" or "second" parameters could be null. So rather than using the equality operator "==" to check for reference equality, we can do it explicitly using the "ReferenceEquals()" method. If the references are the same, then it is the same instance and will have the same identifier value. If both the "first" and "second" person are null references, then we can also return "true". However, if either the "first" or "second" person is null and the other is not null, then they cannot be equal. In order to keep our code dry, we'll just check if the "first" person is null so we know that we can call "first.Equals()" without throwing an exception and will then pass the responsibility for the rest of the evaluation to the "Equals()" method.
  1. public static Boolean operator == (Person first, Person second)  
  2. {  
  3.     if (ReferenceEquals(first, second))  
  4.     {  
  5.         return true;  
  6.     }  
  7.   
  8.     if (ReferenceEquals(first, null))  
  9.     {  
  10.         return false;  
  11.     }  
  12.   
  13.     return first.Equals(second);  
  14. }  
Trap #2

The infinite recursion problem can span multiple methods in this more subtle. If we have implemented the equality operator "==" as above, but then use the equality operator to check for referential equality in the "Equals()" method, we will end up with another infinite recursion.
  1. public bool Equals(Person other)   
  2. {  
  3.     if (other == null)  
  4.     {  
  5.         return false;  
  6.     }  
  7.   
  8.     // evaluate   
  9. }  
To resolve this issue, we can once again use the "ReferenceEquals" method to check for referential equality.
  1. public bool Equals(Person other)  
  2. {  
  3.     if (ReferenceEquals(other, null))  
  4.     {  
  5.         return false;  
  6.     }  
  7.   
  8.     // check identifiers   
  9.     var result = _Identifier.Equals(other._Identifier);  
  10.   
  11.     return result;  
  12. }  
If we are working with a code base that uses equality for checking referential equality, then there is a very high risk of introducing infinite recursive loops if ever someone decided to implement the equality operator "==" as the code base matures. These types of defects are particularly nasty in that they are not caught by the compiler and will only manifest run time and only when you check equality. This leaves defect traps that can take our entire application down potentially that may only be executed in edge cases. Another thing making them difficult to deal with is that deciphering multiple circular method calls is not a trivial task.

If you value your weekends like I do and do not like being called up at 3am on Saturday to put out fires in production, then we are safer betting on using "ReferenceEquals()" rather than the equality operator "==" to check referential equality.

Trap #3

Another thing we have to be careful of is making sure the identifier is not null. This can be accomplished through guard clauses in the constructor.
  1. public Person(String identifier)  
  2. {  
  3.     if (ReferenceEquals(identifier, null))   
  4.     {  
  5.         throw new ArgumentNullException(identifier);  
  6.     }  
  7.   
  8.     _Identifier = identifier;  
  9. }  
If the identifier is a string, we might want to consider not allowing whitespace or empty strings as well, but that depends on the business domain we are modeling. At the very least we need to check for null to protect the "Equal()" method from throwing a null reference exception.
  1. if(String.IsNullOrEmpty(identifier)){}   
  2. if(String.IsNullOrWhiteSpace(identifier)){}   
Trap #4

Since we are overriding the "Equals()" method, we should also override the "GetHashcode()" method. If we don't have a good understanding of the implications of overriding "GetHashcode()" it is easy to introduce a bunch of subtle defects into the system. "GetHashcode()" is used to get an identifier that is used to keep track of your instance. By default, "GetHashcode()" is based off the instance. Here are some of the defects that are easy to introduce.

Let's say we have a "User" class where we have implemented equality:
  1. class User: IEquatable   
  2. {  
  3.     private Int32 _ID;  
  4.   
  5.     public User(Int32 id)   
  6.     {  
  7.         _ID = id;  
  8.     }  
  9.   
  10.     public override bool Equals(object obj)  
  11.     {  
  12.         return Equals(obj as User);  
  13.     }  
  14.   
  15.     public bool Equals(User other)  
  16.     {  
  17.         return _ID == other._ID;  
  18.     }  
  19. }  
If we do not implement "GetHashcode()" then we can get unexpected behaviors. For example, if we use our "User" class as a dictionary key, even though we have implemented equality, we are using the instance to generate a hash to find the class instead of the identifier.
  1. User a = new User(1);   
  2. User b = new User(2);   
  3.   
  4. var friends = new Dictionary<User, User>() { {a , b} };   
  5.   
  6. User c = new User(1);   
  7.   
  8. var areEqual = a.Equals(c); // true   
  9.   
  10. var containsA = friends.ContainsKey(a); // true   
  11. var containsC = friends.ContainsKey(c); // false   
This can create subtle defects in our code base. If we want consistent behavior, "GetHashcode()" should always be implemented if we implement "Equals()".
  1. public class User: IEquatable  
  2. {  
  3.     private Int32 _ID;  
  4.   
  5.     public User(Int32 id)  
  6.     {  
  7.         _ID = id;  
  8.     }  
  9.   
  10.     public override bool Equals(object obj)  
  11.     {  
  12.         return Equals(obj as User);  
  13.     }  
  14.   
  15.     public bool Equals(User other)   
  16.     {  
  17.         return _ID == other._ID;  
  18.     }  
  19.   
  20.     public override int GetHashCode()   
  21.     {  
  22.         return _ID.GetHashCode();  
  23.     }  
  24. }  
Now it will behave as expected:
  1. User a = new User(1);   
  2. User b = new User(2);   
  3.   
  4. var friends = new Dictionary<User, User>() { {a , b} };   
  5.   
  6. User c = new User(1);   
  7.   
  8. var areEqual = a.Equals(c); // true   
  9.   
  10. var containsA = friends.ContainsKey(a); // true   
  11. var containsC = friends.ContainsKey(c); // true   
Trap #5

A problem that can arrive when implementing "GetHashcode()" is when the identifier changes.
  1. public class User: IEquatable  
  2. {  
  3.     private Int32 _ID;  
  4.   
  5.     public User(Int32 id)  
  6.     {  
  7.         _ID = id;  
  8.     }  
  9.   
  10.     public override bool Equals(object obj)   
  11.     {  
  12.         return Equals(obj as User);  
  13.     }  
  14.   
  15.     public bool Equals(User other)  
  16.     {  
  17.         return _ID == other._ID;  
  18.     }  
  19.   
  20.     public override int GetHashCode()   
  21.     {  
  22.         return _ID.GetHashCode();  
  23.     }  
  24.   
  25.     public Int32 ID   
  26.     {  
  27.         get   
  28.         {  
  29.             return _ID;  
  30.         }  
  31.         set  
  32.         {  
  33.             _ID = value;  
  34.         }  
  35.     }  
  36. }  
If the identifier changes, we can "loose" our instance.
  1. User a = new User(1);   
  2. User b = new User(2);   
  3.   
  4. var friends = new Dictionary<User, User>() { {a , b} };   
  5.   
  6. var contains1 = friends.ContainsKey(a); // true   
  7.   
  8. a.ID = 3;   
  9.   
  10. var contains2 = friends.ContainsKey(a); // false   
In order to avoid this behavior, we should always make identifying information immutable in both entity and value objects.
  1. public class User: IEquatable  
  2. {  
  3.     private readonly Int32 _ID; // immutable   
  4.   
  5.     public User(Int32 id)   
  6.     {  
  7.         _ID = id;  
  8.     }  
  9.   
  10.     public override bool Equals(object obj)  
  11.     {  
  12.         return Equals(obj as User);  
  13.     }  
  14.   
  15.     public bool Equals(User other)  
  16.     {  
  17.         return _ID == other._ID;  
  18.     }  
  19.   
  20.     public override int GetHashCode()  
  21.     {  
  22.         return _ID.GetHashCode();  
  23.     }  
  24.   
  25.     public Int32 ID  
  26.     {  
  27.         get {  
  28.             return _ID;  
  29.         }  
  30.     } // immutable   
  31. }  
Wrap-Up

To finish our implementation, we just need to implement our overridden "Equals" method by casting the type to a "Person" class and then call the "IEquality.Equals()" method. This way, if we are passed a type that is anything other than a "Person" instance, the soft cast ("as") will return null.
  1. public override bool Equals(object obj)   
  2. {  
  3.     return Equals(obj as Person);  
  4. }  
  5. Finally, we implement the inequality‘ != ’operator which is required when we implement the equality‘ == ’operator  
  6.   
  7. public static Boolean operator != (Person first, Person second)  
  8. {  
  9.     return !(first == second);  
  10. }  
This is how I arrived at my recommended implementation of equality.

[Original article]

Until next time

Happy Coding

X

Build smarter apps with Machine Learning, Bots, Cognitive Services - Start free.

Start Learning Now