Configuring Horizontal Pod Autoscaling For Azure Kubernetes Services

Sormita Chakraborty
Jul 13, 2020

13.5k
0
3
- facebook
- twitter
- linkedIn
- Reddit
- WhatsApp
- Email
- Print
- Other Artcile

Introduction

Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, replica set, or stateful set based on observed CPU utilization or other application-approved metrics.

This post assumes that you already have a microservice application deployed on AKS cluster. To setup HPA for a cluster deployed on AKS, follow the below steps.

Install metrics-server on your AKS cluster

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml

You will know that metric-server is successfully installed if you get the proper output for the following commands:

kubectl top pod
kubectl top nodes

After the metric-server is successfully setup, it is time to make sure the metric requests and limits are properly configured in the Kubernetes manifest file. The following is a sample manifest file where the request and limit are configured for CPU:

apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe
convert
kompose.version: 1.21.0 (992df58d8)
creationTimestamp: null
labels:
io.kompose.service: loginservicedapr
name: loginservicedapr
spec:
replicas: 1
selector:
matchLabels:
io.kompose.service: loginservicedapr
strategy: {}
template:
metadata:
annotations:
kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe
convert
kompose.version: 1.21.0 (992df58d8)
creationTimestamp: null
labels:
io.kompose.service: loginservicedapr
spec:
containers:
image: loginservicedapr:latest
imagePullPolicy: ""
name: loginservicedapr
resources:
requests:
cpu: "250m"
limits:
cpu: "500m"
ports:
- containerPort: 80
restartPolicy: Always
serviceAccountName: ""
volumes: null
status: {}

Please note that the above deployment file with its resource and limits will work on a Kubernetes cluster deployed on cloud, like Azure or AWS. The above syntax might not work for an on-prem deployment.

A Little Tip

The ‘kompose’ labels you see in the above deployment file are added by Kubernetes compose extension. I had initially developed my microservice on docker. The deployment file was designed for docker-compose. So when I had to deploy it to AKS, I used the ‘kompose’ service to convert my docker-compose deployment file to suit AKS deployment.

Following is the HPA deployment file for the above service:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: loginservicedapr-hpa
spec:
maxReplicas: 10 # define max replica count
minReplicas: 3 # define min replica count
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: loginservicedapr
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
name: cpu
target:
type: Utilization
averageUtilization: 50

Please note that I have used the default algorithm provided by Kubernetes HPA itself to calculate the max and min replicas and the average utilization percentage. Based on your application’s metrics you can devise your own algorithm and put down the numbers.

To deploy the HPA file, do the following:

kubectl apply -f loginserivcedapr-hpa.yaml

If HPA is properly deployed and configured, you should get an output like below:

Configuring Horizontal Pod Autoscaling For Azure Kubernetes Services

In case you received an output like below:

The HPA is not properly deployed or configured. In this case, you have to check what is the exact error HPA is throwing internally.

Try troubleshooting based on the reason that the conditions have their status set as ‘False’, from the above output.

To check whether the HPA is functioning as expected, you can devise a load test from one of the open-source Load Testing platforms. I have used LoadImpact’s cloud-based SAAS service TestBuilder(https://k6.io/docs/cloud/creating-and-running-a-test/test-builder).

Use the load test to continuously send requests to one of the APIs in the service deployed above. The free subscription with LoadImpact allows up to 50 Virtual Users. When you start the load test you should see an increase in CPU utilization,

Try to send more requests so that it reaches the threshold and check whether HPA scales out the number of replicas or not.

Recommended Free Ebook

Mastering Docker A Comprehensive Guide

Download Now!