Azure Service Bus Availability - 5 Aspects You Need To Keep Your Eye On

Introduction 

 
It is significant for mission-critical applications to run continuously, even if in case of any unplanned outages and potential errors. We know that Microsoft Azure guarantees high availability (99.9%) for Service Bus Queues and Topics to receive & send messages when it is properly configured.
 
Errors are bound to happen, but due to the design of Azure systems, issues tend to be short-lived. Nevertheless, many enterprises are still concerned that the Service Bus that handles business-critical data is always up and running. If you are among them, then this article is for you.
 
This article is indented to explain why the Service Bus may go unavailable, due to component failure, server failure, or a faulty data center network switch, rather than disasters like floods or earthquakes, where data may be lost permanently.
 
In order to handle the failures beforehand, you must first understand what can cause the Azure Service Bus to be unavailable. Below are the most common reasons:
  1. The queue may go Disabled/Send disabled/Receive disabled state
  2. The queue may accidentally be removed from the Service Bus namespace itself
  3. The subscription might be expired where the queue is present
  4. Throttling from an external system on which the Service Bus depends
  5. The Quota on the Queue might be exceeded
Now, let's dive into the individual challenges and analyze the workarounds. 
 

The Queue may be Disabled/Send Disabled/Receive a Disabled State

 
Whenever there is temporary unavailability, or an outage happens due to some reasons like a server error, generally we see the entity become unavailable to applications we write in the following different ways:
  • ‘Send Disabled’ - sending messages to the queue is not possible
  • ‘Disabled’ - the queue will not be available for message send or receive operations
  • ‘Receive Disabled’ - receiving messages from the queue, other than peek lock, is not possible

The Queue may Accidentally be Removed from the Service Bus Namespace 

 
This scenario is likely to happen in Enterprises where any of the team members may accidentally remove the Service Bus namespace itself. This could potentially affect the business if not noticed before by the support or operations team. The status of the queue will be “Unknown” and will not be available for any operations in the client applications.
 

The Subscription might be Expired where the Queue is Present

 
This might happen due to the delay in renewing the subscription or disabling the subscription even when it is live, similar to the above scenario which could happen accidentally. This can potentially affect the active queue which is present in the particular subscription. Eventually, the queue will be detected to be in status ‘Unknown’.
 
If you are looking for a solution in order to fix the above-mentioned challenges under one roof, we got you back.
 
Serverless360 can monitor Azure Service Bus Queue state and notify on the expected state not being met. The threshold monitor can be configured to get notified on the above 3 scenarios.
 
Azure Service Bus Availability - 5 Aspects You Need To Keep Your Eye On
 
The notification forwarded due to the unavailability of the queue will look similar to the above picture.
 
Moreover, if the outage is due to any temporary reasons, then the threshold monitor in Serverless360 can auto-correct the state of the queue to active. This will reduce the manual intervention of the support person and help fix the issue a lot faster.
 
Furthermore, you can set a number of retry attempts in order to auto-correct the expected state if the issue persists for a longer period of time.
 
Azure Service Bus Availability - 5 Aspects You Need To Keep Your Eye On
 

Throttling from an External System on which the Service Bus Depends


Microsoft clearly states in its document that there are several thresholds that will affect the maximum throughput achieved before running into throttling conditions like the no. of messages per transaction, message size of the queue, size of queue or topic, etc. It is significant to ensure your entity is not being throttled.
 

The Quota on the Queue Might be Exceeded


When the queue already has messages that occupy its total size, sending any more messages to the queue is not possible. Any more attempts to send a message to the queue will result in User error.
 
Bingo, even the last two challenges can be fixed within the same roof – Serverless360. To provide an out of the box solution, we have come up with another monitor called Data monitor which helps you to keep an eye on the Throttled Requests and user error metric, if fact on even more properties.
 
Azure Service Bus Availability - 5 Aspects You Need To Keep Your Eye On
 
Real-time use case
 
Azure Service Bus Availability - 5 Aspects You Need To Keep Your Eye On
 
If you are wondering why one should be concerned about the service bus availability given the Microsoft SLA, this real-time use case might help you to understand the significance.
 
Consider a Northwind company that has a simple web application that pushes a message onto a service bus queue when a form is being filled.
 
The form approx. takes 5 minutes of a user’s time to fill out and the company does want to ensure that the Service Bus is available when the user pushes the Submit button. As they are more concerned about the user’s time and don’t want to lose the business-critical message, they want the check done before the user fills in the form.
 
If in case they get notified on the service bus queue unavailability, they could simply redirect the user to an error page and hence save the user’s time and get the form filled later.
 
This is where Serverless360 comes into the game and notifies the stakeholders on the unavailability of the Azure Service Bus through its extensive monitors. Also, it tries to bring back the queue to the active state via its unique “AutoCorrect” feature.
 

Conclusion

 
I hope you now have an understanding of the key things that you need to keep track of in order to ensure your Azure Service Bus availability. You can also use third party tooling like Serverless360 to seamlessly make sure that the business-critical Azure Serverless service (Service Bus) is up and running.