The Null Problem in C#

When writing software there are mismatches between the modeling tools and languages we use and the things we are modeling. We create entities in code as part of the model we have built and they are stored in memory in the execution context of our application. We have references to the place in memory where our model is stored that we pass around instead of passing around the whole entity. Otherwise we would end up having to copy the whole thing. The mismatch comes in to play when our reference or pointer does not point to anything.

Null References

For example, let's say we have a person:

  1. public class Person   
  2. {  
  3.     private readonly Int32 _Id;  
  4.     private readonly String _Name;  
  5.   
  6.     private Person(Int32 id, String name)   
  7.     {  
  8.         _Id = id;  
  9.         _Name = name;  
  10.     }  
  11.   
  12.     public Int32 Id   
  13.     {  
  14.         get   
  15.         {  
  16.             return _Id;  
  17.         }  
  18.     }  
  19.     public String Name   
  20.     {  
  21.         get   
  22.         {  
  23.             return _Name;  
  24.         }  
  25.     }  
  26. }  
When we have a method that takes this Person class as a parameter, we don't know if it will be a Person or a null reference.
  1. public void Display(Person person)   
  2. {   
  3.   Console.WriteLine(person.Name + ":" + person.Id);   
  4. }   
The problem comes in when a property (for example Name or Id) is accessed against a null reference. The application will throw a NullReferenceException. There are a couple of reasons this is a problem. The most obvious problem is that in a long method it could take some time to track down the source of the problem. In a stack trace we'll see the problem on the line that is writing to the console.

Because we know this is an invalid state for the method, we should help to make debugging easier by adding a guard clause to assert that the value coming in is valid.
  1. public void Display(Person person)   
  2. {  
  3.     if (ReferenceEquals(person, null))   
  4.     {  
  5.         throw new ArgumentNullException("person");  
  6.     }  
  7.   
  8.     Console.WriteLine(person.Name + ":" + person.Id);  
  9. }  
This way the actual problem is surfaced faster and we can safely assume the "person" reference is safe to use in the rest of the method. As a rule of thumb, high quality code bases have one assertion for every six lines of code. Verifying parameters at the top of method is one of the more important coding standard rules I enforce on teams that I manage to help drive quality.

Null is Another Type of State


The second, more subtle, problem we have is that now there is another state we have to deal with which drives complexity into the system.

Looking at the simplest case, if we have a nullable boolean there are now three states we have to worry about. Boolean is a CLR value type and cannot be null. But if we make the boolean nullable with the syntax below, we now have a tertiary state that has to be addressed.
  1. Nullable<Boolean> value = true;   
  2. Nullable<Boolean> value = false;   
  3. Nullable<Boolean> value = null;   
  4.   
  5. Boolean? value = true;   
  6. Boolean? value = false;   
  7. Boolean? value = null;   
Now we have to handle three cases instead of two.
  1. public void CheckSomething(Boolean ? value)   
  2. {  
  3.     if (ReferenceEquals(value, null)) // or value.HasValue   
  4.     {  
  5.         // do something for the extra case   
  6.     }  
  7.     else if (value.Value)  
  8.     {  
  9.         /// handle 'true' case   
  10.     }   
  11.     else  
  12.     {  
  13.         // handle 'false' case   
  14.     }  
  15. }  
This is significantly more complex than just dealing with a boolean.
  1. public void CheckSomething(Boolean value)   
  2. {  
  3.     if (value)   
  4.     {  
  5.         /// handle 'true' case   
  6.     }   
  7.     else   
  8.     {  
  9.         // handle 'false' case   
  10.     }  
  11. }  
Many programmers do not know exactly how this extra state behaves. Do you know for absolute certainty what each variable will be here?
  1. public static void GetState(Boolean ? value)   
  2. {  
  3.     var refEqualsNull = ReferenceEquals(value, null);  
  4.     var equalsNull = value == null;  
  5.     var hasValue = value.HasValue;  
  6. }  
If you have to stop and think about it that means the code is not as clear as it could be and there is a higher probability of defects working their way into the code base as it matures.

Modeling and the Null State


The big problem is when the null state is not aligned with the model we are trying to build which is intended to represent a small slice of reality in order to provide business value. Good code is closely aligned with the domain it is modeling. Great code is part of a ubiquitous language that permeates not only all the technical layers but also determines how business and technical experts talk about the system. The concept of "null" is a part of the computer science domain that leaks into the business domain being modeled. It is not well-aligned with business concepts.

For example, if we are having a discussing about how people are related with a non-technical business domain expert to clarify the following code:
  1. var car = new Automobile();  
  2. var road = new Road();  
  3.   
  4. if (car.CanTravelOn(road))   
  5. {  
  6.     // do something   
  7. }  
When we are trying to understand the implications of null it becomes hairy.
  1. public void Traverse(Automobile car, Road[] route)   
  2. {  
  3.     if (car == null// equality as a concept does not make sense here   
  4.     {  
  5.         // handle null case   
  6.     }  
  7.   
  8.     foreach(var road in route)   
  9.     {  
  10.         if (road == null// now we are outside the domain   
  11.         {  
  12.             // handle null case   
  13.         }  
  14.   
  15.         if (car.CanTravelOn(road))
  16.         {  
  17.             // do something   
  18.         }  
  19.     }  
  20. }  
Equality does not make sense as a concept to the business expert here. The technician will understand that equality "==" in this case most likely (but not always) is comparing a pointer or reference. But now we have a cluttered code base that is not well aligned with the domain. Equality should be reserved for really checking if two things are equal from a business perspective and not a reference check. Otherwise, we are forced to jump in and out of the model and we risk equality meaning two different things.

Instead of allowing nulls throughout our code base, it is clearer to have a special case or instead of having repeated checks for null, use the more specific NullObject pattern as prescribed by Martin Fowler. If this is rigorously enforced through the code base then our in-line null checks are no longer necessary. In addition, if we use more explicit language that clearly is checking references instead of using equality "==" there is less chance for miscommunication with business experts. If we are consistent about placing guards at the top of every method to enforce correct consumption and don't let the guard clauses mix with the business logic, the code will be much clearer from a business perspective.
  1. public class NoRoad: Road // instead of passing nulls, pass this   
  2. {  
  3.     public Int32 Miles  
  4.     {  
  5.         get   
  6.         {  
  7.             return 0;  
  8.         }  
  9.     }  
  10. }  
  11. public void Traverse(Automobile car, Road[] route)   
  12. {  
  13.     /// Enforce correct consumption   
  14.     if (ReferenceEquals(car, null))   
  15.     {  
  16.         throw new ArgumentNullException("car");  
  17.     }  
  18.   
  19.     /// Business logic starts here   
  20.   
  21.     foreach(var road in route)  
  22.     {  
  23.         if (car.CanTravelOn(road))  
  24.         {  
  25.             // do something   
  26.         }  
  27.     }  
  28. }  
  29.   
  30. // option: store NullObject reference to use instead of throwing.   
  31. public void Traverse2(Automobile car, Road[] route)   
  32. {  
  33.     /// Make sure we have a car   
  34.     var car = car ? ? this.MissingCar; // store NullObject at the class level   
  35.   
  36.     /// Business logic starts here   
  37.   
  38.     foreach(var road in route)  
  39.     {  
  40.         if (safeCar.CanTravelOn(road))  
  41.         {  
  42.             // do something   
  43.         }  
  44.     }  
  45. }  
Special Case

A subclass that provides special behavior for particular cases: Martin Fowler

Indeterminate Behavior With "==" Operator and ".Equals()" Method

According to the Microsoft Guideline

Unlike the Equals method and the equality operator, the "ReferenceEquals()" method cannot be overridden. Because of this, if you want to test two object references for equality and you are unsure about the implementation of the Equals method, you can call the "ReferenceEquals()" method.

Any healthy code base will change over time. We cannot guarantee that the Equals() methods or equality operators "==" will have consistent behavior over time. Therefore, to have a more stable code base it is better to have explicit reference checks rather than relying on indeterminate behavior. In the worst case, having indeterminate code throughout the code base will cause defects that are very hard to diagnose, because a change in the class will change code in unknown places and we can have a butterfly effect from changes in our code base.

The argument could be made whether equality and the equality operator should be overridden. In many cases overriding operators does not make sense. But this is a debate that needs to be taken on a case-by-case basis. The fact that we have the ability to override the equality operator is a language feature in C# means that the possibility exists. We can carefully inspect all changes going into our code bases to help try and avoid equality overriding if we have determined it should not be done. While auditing can help, it does not guarantee a stable code base. Irregardless of where the debate lands on the usage of this language feature, if we want a stable code base it is better to use the "ReferenceEquals()" method to check for null references.

Recommendation For Working With Null

Keep code determinate, simple and well aligned with business. This can be accomplished by explicitly checking for null. If we are talking about equality, try to leave null out of the equation.

This will keep your code base stable, reduce the chance for defects to be introduced and facilitate development momentum.

Until next time

Happy Coding

Here is my recommendation for implementing equality in C#.

[Original article]