Azure Service Fabric - Understanding Cluster, Replicas, Instances And Partitions - Part Three


We started the series with an introduction of Microservices architecture and Azure Service Fabric. Then we explored different programming models of Azure Service Fabric available where we had a walkthrough of Reliable Services, Reliable Actors, Guest Executables, Containers and ASP.NET Core models for Service Fabric. In this article we will try to understand couple of basic concepts in the Azure Service Fabric world like Clusters, Partitions, Replicas and Instances. These basic concepts will help us understand further when we start with actual programming concepts in the next set of articles.

Below is the link to the previous articles and as a prerequisite it would be a good idea to go through these articles before proceeding ahead here.


Service Fabric clusters are group of physical machines or virtual machines or containers grouped or closely networked together. These virtual machines or physical machines or containers in the Service Fabric cluster are called as Cluster Nodes. Figure 1 shows a simple representation of Service Fabric Cluster having Cluster Nodes grouped in it. Service Fabric Cluster Nodes are managed by Service Fabric Cluster Manager.

Azure Service Fabric - Understanding Cluster, Replicas, Instances And Partitions

Whenever we create any Service Fabric cluster on Azure, the below components get created and are an  integral part of the cluster. This is illustrated in Figure 2.

Virtual Machine Scaleset 

Each cluster node in Azure Service Fabric is usually a Virtual Machine Scaleset in Production Environment. This facilitates Scaling and High Availability for services deployed on this node.

Load Balancer 

Each Virtual Machine Scaleset or node is placed behind a Load Balancer to meet scaling requirements and manage traffic among Virtual Machines in the Scaleset or node.

Network Security Group 

Each Virtual Machine Scaleset or Node is associated with a Network Security Group

Virtual Network 

Each of the nodes are connected to the same Virtual Network so that they can be networked closely under a single umbrella.

Storage Accounts 

Stores diagnostics, logs and virtual hard disks.

Security components like Azure Active Directory and Key Vault are also used by Service Fabric

Azure Service Fabric - Understanding Cluster, Replicas, Instances And Partitions

Instances and Replicas

An Instance is a copy of Service Logic that runs on a Stateless Service Fabric node. In a cluster there can be multiple Instances of a Service Logic running across nodes. Number of Instances of a Service Logic can be increased to handle Scale Out requirements. When one of the Instance goes down another spins up with same Service Logic thus ensuring High Availability.  

Figure 3 illustrates four Instances spread across three nodes. Instance I1 is hosted on Node 1 and Node 3. Instance I2 is hosted on Node 1 and Node 2. Instance I3 is hosted on Node 2 and Node 3. Instance I4 is hosted on Node1 and Node 3.
Azure Service Fabric - Understanding Cluster, Replicas, Instances And Partitions
Whenever a Service Fabric Cluster Manager decides on which node to place the Instance, the Instance gets into a life cycle state InBuild. In this state the Instance boots up. Once the Instance gets started it moves into Ready State. When the Instance shutdown is initiated it gets into Closing life Cycle state. And finally it gets into Dropped state when the Instance has been shut down. It is to be noted that from any of the State in the life cycle, the Instance can transition to Dropped State by Force mechanisms.

Replica is a copy of Service Fabric Logic and its associated state data that runs on a Stateful Service Fabric node. Just like an Instance, Service can be hosted across Replicas spread across nodes in cluster. Figure 4 illustrates four Replicas R1, R2, R3, R4 spread across three nodes.

Azure Service Fabric - Understanding Cluster, Replicas, Instances And Partitions
Read and write operation are performed on a Replica in the set called as Primary Replica. The data written on Primary replica are synced with other replicas where the same service logic is hosted. These replicas are called Active Secondary Replicas. In case the Primary Replica goes down for some reason then one of the Active secondary Replica gets elevated as a Primary Replica thus ensuring High Availability.

A Replica when started gets into InBuild state and then transitions to Ready state. While shutting down or removing from cluster it gets into Closing state. Replicas can be transitioned to Down state whenever there is a fault, upgrade on node or application, crash in replica code. In this state the replica code is not running but the state is persisted. Whenever the node starts recovering from Down state it gets into Opening state. If there is a crash or equivalent during Opening state then the replica again gets into Down state. If the Opening state is successful and the replica recovers it gets into Standby state, waiting for its turn to be included in the active replica set where the service logic is running. In the Standby state, prior to Down state data will be available and hence it can be easily included in the active replica set by handling the delta data.


Let us try to understand Partitions using a simple real world example of book racks. Suppose we have three book racks having five hundred books. It would be very difficult to locate a book in these racks and it would be time consuming as well to search across all the books and find the required book.

But what if we arrange the books alphabetically, then keep books starting with letters A to I in the first rack, then keep books starting with letters J to R in the second rack and rest of the books in the third rack. It would be very easy to search a book as our search scope has reduced. Suppose the book title starts with M we need not search the first rack and third rack and directly look for the book in the second rack. We just need to search from only the books kept in second rack and not from the whole set of five hundred books. This would be more efficient and less time consuming. Technically we have partitioned the books in three racks. In the same way service replicas and instances can be partitioned in Service Fabric.

Azure Service Fabric - Understanding Cluster, Replicas, Instances And Partitions
In Figure 5 we have four Replicas spread across four partitions hosted in three nodes. The incoming requests to the replicas are distributed efficiently in the right partition and thus the requests are handled in a performant manner. It would not make more sense to partition Instances though it is supported. Based on data rules Replicas can be partitioned to handle the incoming requests. However there would be specific needs at times like a specific request has to be routed only to a specific service instance and be a sticky request there. In such a case Instances can be partitioned.

Winding up

This article provided an understanding of Clusters, Instances, Replicas and Partitions. In the next article we will start building Microservices applications using Azure Service Fabric.