How Global Distribution Works And How To Handle Data Center Outage In Cosmos DB

Article

In this article, I am going to explain how global distribution works and how outages have been handled in Cosmos DB. In my previous article, we discussed the basics of Cosmos DB.

How global distribution works

Azure Cosmos DB is classified as a foundational service in Azure, which means Cosmos DB is available in all new regions by default. So, we can distribute our database in the upcoming region also.

Azure Cosmos DB has separate databases in two types: the Primary database (also called Write database) and the Secondary (also called Read database). Based on our business we can select primary and secondary regions. We should be more conscious while selecting regions because every single click will end up with excessive cost.

What is Primary Database?

The primary database is your main database. All insert, update and delete operations are done in the primary database. Primary database is the default and mandatory, that’s the reason service location acts as write region.

Primary database is chosen automatically when you select the location in new Cosmos DB service creation. Select your primary database where you have more clients.

What is Secondary Database?

The secondary database means a replica of the primary database. It is used to speed up the data accessing in retrieval. This is not a mandatory one and we can pick based on our needs. We can select any number of read data regions in a just single click. But be cautious; every single selection is cost oriented. Data replication in various regions across the globe is very fast and durable with a Secondary Database. It is the beauty of Cosmos DB.

Failover

Failover means to recover the fails. Natural disasters are unpredictable and we cannot escape from that. But our precaution plans and steps help to recover from that without any loss. In my above picture, I selected South India as my primary (write) region. Unfortunately, for example, my region was affected by a natural disaster and my data center is in an outage. Your application will be down because of data center region. To avoid this type of problem, we must set up failover feature in Cosmos DB.

To handle this situation, we go for Replicate data globally. Once again, I must mention that Data center outage is a rare event.

How Failover works

As I explained, we can have any number of read data centers, as many as we need. One of the read data centers is working as write data center as per our priorities of read data center. Priorities work in a top to bottom approach (Priority 1,2 up to n). If priority data center is not working then go to the next priority data center.

We can accomplish failover in two ways. They are,

Manual Failover
Automatic Failover

Let me explain how to implement this feature step by step,

Create one Cosmos DB service
Set one write data region and read data region.
Click on Replicate data globally
Then click Automatic failover
Click enable automatic failover ON
Drag and drop our priorities and click Ok

Manual Failover

It is in addition to Automatic failover. We can change write region manually in specific account. We can do the same using Azure portal or programmatically.

Why we need Failover

In enterprises applications we need to provide compliance certification with Business Continuity and Disaster Recovery (BCDR) and High Availability and Disaster Recovery (HADR).

We can test the BCDR readiness of our applications that use Cosmos DB for storage by triggering a manual failover of your Cosmos DB account and/or adding and removing a region dynamically

Predictable clock model

If our applications have predictable traffic patterns based on the time of the day, you can periodically change the write status to the most active geographic region based on time of the day.

In this article, we have seen details of how global distribution is working and how it handles failover, types of failover methods and steps to implement the same. I plan to write my next article on Cosmos DB consistency level and how to set up multiple write data centers and how to handle the data duplication on multiple writes.