Leader Election With Redlock.net


Microservice architecture is widely adopted these days. One of the benefits is that offers the possibility of horizontal scaling, which allows us to increase the performance of our application dramatically. However, there are situations when multiple instances of service face contention for some shared resource.
Consider a service that, apart from other functionalities, runs some mission-critical job once per day which should be executed in a single instance. At the same time, the deployment of a single instance is counterproductive, since a microservice bears other functionalities which would benefit from horizontal scaling. One may argue that we can split such microservice into even smaller microservices, but I'd be cautious against making microservices too granular.
Another reason might be the poor man's failover scenario, where one instance of a service executes work, while the other is idle and waits just in case the first instance fails for some reason.
As a solution, I offer to elect a single leader which would handle shared resource exclusively at a single point of time.
Such well-known leader election algorithms such as the Bully algorithm or Ring algorithm require a lot of ceremony and knowledge of the logical topology of your system in order to be implemented. That's the reason why we'll have a look at leader election using a distributed lock. You should use this pattern when the tasks in a distributed application need careful coordination and there's no natural leader.
As storage for a distributed lock, we'll use Redis. Redis is in-memory key-value storage so we'll take advantage of its speed. There is already a library that implements distributed lock over Redis. So we just have to make use of it.

The code

The sample code can be accessed on GitHub. Let's break down what actually happens here.
The idea behind leader election via distributed lock is whoever acquires lock over shared resource becomes a leader. So naturally, we have a lock key quite similar to the built-in C# `lock` construct.
  1. private const string _resource = "the-thing-we-are-locking-on";  
Obviously, storage is a single point of failure so we have to make sure that it is reliable. RedLock.net which we use for our case allows us to use multiple instances of Redis instead of single in order to improve reliability.
Here's how we create a connection to Redis during start up.
  1. var endPoints = new List<RedLockEndPoint>  
  2. {  
  3.     new DnsEndPoint("redis1", 6379)  
  4.     new DnsEndPoint("redis2", 6379)  
  5.     new DnsEndPoint("redis3", 6379)  
  6. };  
  7. _distributedLockFactory = RedLockFactory.Create(endPoints);  
Every instance try to acquire a lock once in a given period of time. If it succeeds it becomes a leader. If not, it will try once again later.
  1. private readonly TimeSpan _expiry = TimeSpan.FromSeconds(_expirySecondsCount);  
  2. _acquireLockTimer = new Timer(async state => await TryAcquireLock((CancellationToken)state), _cts.Token, 0, _expirySecondsCount * 1000);  
However, the leader does not need to re-acquire a lock since it has auto-extend feature. On first encounter with RedLock.net, this might be unintuitive so it should be noted. Let's have a look at the TryAcquireLock method.
  1. private async Task TryAcquireLock(CancellationToken token)  
  2. {  
  3.     if (token.IsCancellationRequested)  
  4.         return;  
  6.     var distributedLock = await _distributedLockFactory.CreateLockAsync(_resource, _expiry);  
  7.     if (distributedLock.IsAcquired)  
  8.     {  
  9.         DoLeaderJob();  
  10.         _acquireLockTimer.Dispose(); //no need to renew lock because of autoextend  
  11.     }  
  12. }  
As mentioned above, we get rid of re-acquire timer as soon as an instance becomes a leader taking advantage of the auto-extend feature.
Once the instance fails, the lock is released and it is up to the other instances for the taking.


As we can see, the implementation of leader election via a distributed lock is pretty straightforward. Still, it should be used with care since every locking increases contention between instances of a microservice, thus reducing the benefits of horizontal scaling.