Azure Kubernetes Service - Scaling

Introduction

 
By now, we have explored the Architecture and Networking options for Azure Kubernetes Service in the following articles.
Scaling is an essential architectural concern and must get addressed for any application we build as it makes the application performant. During peak hours, more computing resources should get allocated for the application. When the number of requests decreases, the extra computing resources should get deallocated. In this way, Scaling guarantees the application performance by allocating the right amount of computing resources for the application.
 
Modern cloud platforms provide the following ways to scale the application.
  • You can scale the application manually by adding or removing the compute resources as and when needed.
  • You can define performance criteria based on which the underlying platform scales the application. The criteria can be either Memory usage or CPU usage limit or any other performance criteria. If the application exhausts these performance limits, then the underlying platform allocates extra computing resources to handle the incoming load.
  • You can let the underlying platform decide and act on when to scale and how to scale. This scaling approach is a very new scaling option introduced by Serverless computing.
In Azure Kubernetes Service, you can scale manually, or automatically based on performance criteria or let the underlying platform manage the scaling aspects. In this article, let us explore the scaling options available for Azure Kubernetes Service.
 

Manual Scaling in Azure Kubernetes Service

 
Here you can either scale the nodes manually or scale the pods manually to address the computing requirements for the incoming requests. We define the Azure Kubernetes Cluster configuration like the number of pods, replica sets, container images, and many more using the deployment configuration YAML file. And we use kubectl command to apply the configuration to the cluster. To scale the number of pods manually, you can run the kubectl scale command as following.
  1.   kubectl scale --replicas=3 deployment/myapplication
Here we are scaling the Pod for myapplication deployment to 3.
 
If you do not get this command, then do not worry as we are going to explore the kubectl commands in the subsequent articles.
 
You can scale the nodes manually using Azure CLI command as following.
  1. az aks scale --resource-group myaksresourcegroup --name myaks --node-count 5  
Here we are scaling the nodes count in the AKS cluster named myaks in the Resource Group myaksresourcegroup to 5.
 

Automatic Scaling in Azure Kubernetes Service

 
Here also you can scale both the nodes and the pods automatically based on the performance limits you provide as we can do in the case of Manual scaling. You can reserve the minimum and the maximum number of pods and nodes while creating the deployment configuration and apply the configuration to the Kubernetes cluster using the kubectl apply command. The underlying platform monitors the performance limits set. It then scales the nodes and the pods within the minimum and the maximum limit set.
 
Azure Kubernetes Service uses the Horizontal pod auto scaler to scale the Pods. Based on the performance limit criteria, it gets the performance metrics using an Azure Kubernetes Service component called Metrics Server. The underlying platform keeps querying the Metrics Server for the performance and current usage criteria. Once the limit is reached, it adds new Pods to the cluster. The Horizontal Pod auto scaler removes the extra pods when the load decreases.
 
Azure Kubernetes Service uses the Cluster auto scaler to scale the Nodes. The Cluster auto scaler keeps checking the Metrics Server for the performance and current usage. And once the defined limit  is reached, the Cluster auto scaler adds new Nodes to the cluster within the minimum and the maximum number of nodes defined. It removes the extra nodes when the load decreases.
 
Figure 1 depicts the Horizontal pod auto scaler and the Cluster auto scaler in action.
 
Azure Kubernetes Service - Scaling
Figure 1
 

Burst Scaling using Serverless Virtual Nodes

 
The Cluster auto scaler takes some time to spin out new nodes. There is always a delay in spinning out a new Node, which is a Virtual Machine, and making it ready so that it can host the Pods. Serverless Virtual Nodes can be used instead of Virtual Machine Nodes. The Virtual Nodes run on Azure Container Instances and spin out in no time and make scaling very fast. The Serverless Nodes can burst into multiple instances in no time and balance the incoming load much faster as compared to the Nodes based on Virtual Machines. The underlying platform manages the number of minimum and the maximum number of numbers based on incoming traffic, and you do not have any control over it. The Virtual Nodes are based on Virtual Kubelet open-source implementation.
 
You can use Serverless Virtual Nodes along with the Cluster auto scaler and the Horizontal pod auto scaler to build a highly performant application. Figure 2 depicts using the Cluster auto scaler and the Horizontal pod auto scaler along with the Serverless Virtual nodes based on Azure Container Instances. The application can scale using the Cluster auto scaler and the Horizontal pod auto scaler and also using the Virtual Nodes.
 
Azure Kubernetes Service - Scaling
Figure 2
 

Winding up

 
In this article, we explored the Scaling options available for Azure Kubernetes Service. In the next article, we will explore the Storage options provided by the Azure Kubernetes Cluster.