Kubernetes  

How to Configure Autoscaling in Kubernetes Using Horizontal Pod Autoscaler

Introduction

In real-world cloud applications, user traffic is never constant. Sometimes your application gets very high traffic (like during sales or peak hours), and sometimes it is almost idle. If you keep a fixed number of pods in Kubernetes, you either face performance issues during high traffic or waste resources during low traffic.

This is where Kubernetes autoscaling comes into play. Using Horizontal Pod Autoscaler (HPA), you can automatically increase or decrease the number of pods based on CPU usage, memory usage, or other metrics.

In this step-by-step Kubernetes autoscaling tutorial, you will learn how to configure HPA in simple words with practical examples.

What is Horizontal Pod Autoscaler in Kubernetes?

Horizontal Pod Autoscaler (HPA) is a Kubernetes feature that automatically scales the number of pods in your application based on resource usage like CPU or memory.

Simple explanation:

  • If load increases → pods increase

  • If load decreases → pods decrease

Real-life example:
Think of a food delivery app like during dinner time. When many users place orders, more delivery agents (pods) are needed. When orders reduce, fewer agents are required. HPA works exactly like this.

Prerequisites for Kubernetes Autoscaling

Before you configure autoscaling in Kubernetes, make sure the following things are ready:

  • A working Kubernetes cluster (local or cloud like AWS, Azure, GCP)

  • kubectl installed and configured

  • Metrics Server installed (very important for HPA)

  • A running deployment to scale

Without Metrics Server, Kubernetes cannot track CPU or memory usage, so autoscaling will not work.

Step 1: Install Metrics Server in Kubernetes

Metrics Server is required for Kubernetes HPA because it collects CPU and memory usage data.

Run this command:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

After installing, verify:

kubectl get deployment metrics-server -n kube-system

Simple understanding:
Metrics Server acts like a monitoring tool that tells Kubernetes how much CPU your pods are using.

Step 2: Create or Verify Deployment

Before applying autoscaling, you must have a deployment running.

Example Kubernetes deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: nginx
        resources:
          requests:
            cpu: "100m"
          limits:
            cpu: "500m"

Important concept:

  • CPU request = minimum guaranteed CPU

  • CPU limit = maximum CPU allowed

Why this matters:
HPA uses CPU request as a reference to calculate scaling. If you don’t define it, autoscaling will not work correctly.

Real-world mistake:
Many beginners forget to define CPU requests, and then HPA does not scale properly.

Step 3: Create HPA Using kubectl Command

You can quickly configure autoscaling using a simple command:

kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10

Explanation in simple words:

  • CPU 50% → target average CPU usage

  • min=2 → minimum pods always running

  • max=10 → maximum pods allowed

Real-world example:
If your application CPU goes above 50%, Kubernetes will automatically increase pods to handle load.

Step 4: Configure HPA Using YAML (Recommended for Production)

For real-world projects, always use YAML configuration.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Apply the configuration:

kubectl apply -f hpa.yaml

Why YAML is better:

  • Easy to manage in DevOps pipelines

  • Version controlled

  • Reusable across environments

Step 5: Verify Kubernetes Autoscaling

To check if HPA is working:

kubectl get hpa

For detailed information:

kubectl describe hpa my-app-hpa

This will show:

  • Current CPU usage

  • Number of replicas

  • Scaling events

Step 6: Test Autoscaling in Kubernetes

To see autoscaling in action, generate load:

kubectl run -i --tty load-generator --image=busybox /bin/sh

Inside container:

while true; do wget -q -O- http://my-app; done

Now monitor scaling:

kubectl get hpa -w

You will notice pods increasing when CPU usage increases.

Simple understanding:
More traffic → more pods
Less traffic → fewer pods

Advanced Kubernetes Autoscaling Configuration

You can improve autoscaling using advanced options.

Memory-based autoscaling:

- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 60

Multiple metrics:
You can use both CPU and memory together for better accuracy.

Custom metrics:
Using tools like Prometheus, you can scale based on:

  • Requests per second

  • Queue length

  • Business metrics

When to Use Kubernetes HPA

Use Kubernetes Horizontal Pod Autoscaler when:

  • Traffic is unpredictable

  • You want automatic scaling in Kubernetes

  • You need cost optimization

Avoid HPA when:

  • Your application is not designed for scaling

  • Metrics are not reliable

Advantages of Kubernetes Autoscaling

  • Automatically handles traffic spikes

  • Improves application performance

  • Reduces infrastructure cost

  • No manual scaling required

Disadvantages and Common Mistakes

  • Metrics Server must be installed

  • Scaling is not instant (takes time)

  • Wrong CPU requests lead to wrong scaling

Real-world mistake:
If CPU request is too low, Kubernetes thinks load is high and keeps scaling unnecessarily.

Best Practices for Kubernetes Autoscaling

  • Always define CPU and memory requests properly

  • Start with small scaling limits and adjust

  • Monitor performance regularly

  • Use multiple metrics when possible

Summary

Configuring autoscaling in Kubernetes using Horizontal Pod Autoscaler helps your application automatically adjust to changing traffic conditions, improving performance and reducing costs. By properly setting CPU and memory requests, installing Metrics Server, and defining clear scaling policies using YAML or kubectl, you can build a reliable and scalable system. Understanding how HPA works and avoiding common mistakes ensures that your Kubernetes autoscaling setup remains efficient, stable, and production-ready.