Configuring Horizontal Pod Autoscaling For Azure Kubernetes Services

Introduction 

 
Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, replica set, or stateful set based on observed CPU utilization or other application-approved metrics.
 
This post assumes that you already have a microservice application deployed on AKS cluster. To setup HPA for a cluster deployed on AKS, follow the below steps.
 

Install metrics-server on your AKS cluster

 
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
 
You will know that metric-server is successfully installed if you get the proper output for the following commands:
  • kubectl top pod
  • kubectl top nodes
After the metric-server is successfully setup, it is time to make sure the metric requests and limits are properly configured in the Kubernetes manifest file. The following is a sample manifest file where the request and limit are configured for CPU:
  1. apiVersion: apps/v1  
  2. kind: Deployment  
  3. metadata:  
  4.   annotations:  
  5.     kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe  
  6.       convert  
  7.     kompose.version: 1.21.0 (992df58d8)  
  8.   creationTimestamp: null  
  9.   labels:  
  10.     io.kompose.service: loginservicedapr  
  11.   name: loginservicedapr  
  12. spec:  
  13.   replicas: 1  
  14.   selector:  
  15.     matchLabels:  
  16.       io.kompose.service: loginservicedapr  
  17.   strategy: {}  
  18.   template:  
  19.     metadata:  
  20.       annotations:  
  21.         kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe  
  22.           convert  
  23.         kompose.version: 1.21.0 (992df58d8)  
  24.       creationTimestamp: null  
  25.       labels:  
  26.         io.kompose.service: loginservicedapr  
  27.     spec:  
  28.       containers:        
  29.         image: loginservicedapr:latest  
  30.         imagePullPolicy: ""  
  31.         name: loginservicedapr  
  32.         resources:  
  33.           requests:  
  34.             cpu: "250m"  
  35.           limits:  
  36.             cpu: "500m"              
  37.         ports:  
  38.         - containerPort: 80          
  39.       restartPolicy: Always  
  40.       serviceAccountName: ""  
  41.       volumes: null  
  42. status: {}  
Please note that the above deployment file with its resource and limits will work on a Kubernetes cluster deployed on cloud, like Azure or AWS. The above syntax might not work for an on-prem deployment.
 

A Little Tip 

 
The ‘kompose’ labels you see in the above deployment file are added by Kubernetes compose extension. I had initially developed my microservice on docker. The deployment file was designed for docker-compose. So when I had to deploy it to AKS, I used the ‘kompose’ service to convert my docker-compose deployment file to suit AKS deployment.
 
Following is the HPA deployment file for the above service:
  1. apiVersion: autoscaling/v1  
  2. kind: HorizontalPodAutoscaler  
  3. metadata:    
  4.   name: loginservicedapr-hpa  
  5. spec:  
  6.   maxReplicas: 10 # define max replica count  
  7.   minReplicas: 3  # define min replica count  
  8.   scaleTargetRef:  
  9.     apiVersion: apps/v1  
  10.     kind: Deployment  
  11.     name: loginservicedapr  
  12.   metrics:  
  13.   - type: Resource  
  14.     resource:  
  15.       name: cpu  
  16.       target:  
  17.         type: Utilization  
  18.         averageUtilization: 50  
  19.   - type: Pods  
  20.     pods:  
  21.       name: cpu  
  22.       target:  
  23.         type: Utilization  
  24.         averageUtilization: 50  
Please note that I have used the default algorithm provided by Kubernetes HPA itself to calculate the max and min replicas and the average utilization percentage. Based on your application’s metrics you can devise your own algorithm and put down the numbers.
 
To deploy the HPA file, do the following:
 
kubectl apply -f loginserivcedapr-hpa.yaml 
 
If HPA is properly deployed and configured, you should get an output like below:
 
Configuring Horizontal Pod Autoscaling For Azure Kubernetes Services
 
In case you received an output like below:
 
Configuring Horizontal Pod Autoscaling For Azure Kubernetes Services
 
The HPA is not properly deployed or configured. In this case, you have to check what is the exact error HPA is throwing internally.
 
Configuring Horizontal Pod Autoscaling For Azure Kubernetes Services
 
Try troubleshooting based on the reason that the conditions have their status set as ‘False’, from the above output.
 
To check whether the HPA is functioning as expected, you can devise a load test from one of the open-source Load Testing platforms. I have used LoadImpact’s cloud-based SAAS service TestBuilder(https://k6.io/docs/cloud/creating-and-running-a-test/test-builder).
 
Use the load test to continuously send requests to one of the APIs in the service deployed above. The free subscription with LoadImpact allows up to 50 Virtual Users. When you start the load test you should see an increase in CPU utilization,
 
Configuring Horizontal Pod Autoscaling For Azure Kubernetes Services
 
Try to send more requests so that it reaches the threshold and check whether HPA scales out the number of replicas or not.


Similar Articles