Software Architecture/Engineering  

What is Rate Limiting Algorithm (Token Bucket vs Leaky Bucket) and How Do They Work?

Introduction

In modern web applications, APIs and services must handle thousands or even millions of requests. Without control, excessive requests can overload servers, degrade performance, or even cause downtime. This is where rate limiting algorithms come into play.

Rate limiting helps control how many requests a user or system can make within a specific time period. It is widely used in API security, backend systems, and cloud applications.

In this guide, you will learn rate limiting algorithms in simple words, focusing on Token Bucket vs Leaky Bucket, how they work, and when to use each.

What is Rate Limiting?

Rate limiting is a technique used to restrict the number of requests a client can make to a server within a given time.

Explanation:

  • Limit how often a user can send requests

  • Prevent server overload

  • Ensure fair usage

Real-life example:

Think of a water tap that allows only a certain amount of water per minute. If you try to exceed that limit, the flow is restricted.

Why Rate Limiting is Important in Web Applications

Rate limiting is essential for:

  • Preventing API abuse

  • Protecting against DDoS attacks

  • Ensuring fair usage among users

  • Improving system stability

Before rate limiting:

  • Unlimited requests

  • High server load

After rate limiting:

  • Controlled traffic

  • Better performance

Types of Rate Limiting Algorithms

Two of the most popular algorithms are:

  • Token Bucket Algorithm

  • Leaky Bucket Algorithm

Token Bucket Algorithm

The Token Bucket algorithm allows requests based on available tokens in a bucket.

How Token Bucket Works

Step 1: Create a Bucket

A bucket is created that can hold a fixed number of tokens.

Step 2: Add Tokens at a Fixed Rate

Tokens are added to the bucket at a constant rate (e.g., 1 token per second).

Step 3: Consume Tokens for Each Request

Each incoming request consumes one token.

Step 4: Allow or Reject Request

  • If token is available → request is allowed

  • If no token → request is rejected or delayed

Simple understanding:

You can make requests only if you have tokens.

Real-life example:

You get 10 coupons per minute. Each action costs 1 coupon. If coupons finish, you must wait.

Advantages of Token Bucket

  • Allows burst traffic

  • Flexible and efficient

  • Good for APIs and microservices

Disadvantages of Token Bucket

  • Slightly complex implementation

  • Requires token management

Leaky Bucket Algorithm

The Leaky Bucket algorithm processes requests at a fixed rate, regardless of incoming traffic.

How Leaky Bucket Works

Step 1: Incoming Requests Enter Bucket

Requests are added to a queue (bucket).

Step 2: Process Requests at Constant Rate

Requests are processed at a fixed rate (e.g., 1 request per second).

Step 3: Handle Overflow

If bucket is full:

  • New requests are dropped

Simple understanding:
Requests flow out steadily like water leaking from a bucket.

Real-life example:
Imagine a funnel that releases water slowly. Even if you pour fast, output remains constant.

Advantages of Leaky Bucket

  • Smooth request rate

  • Prevents sudden spikes

  • Easy to understand

Disadvantages of Leaky Bucket

  • Does not allow burst traffic

  • Can drop requests under heavy load

Token Bucket vs Leaky Bucket Comparison

Token Bucket:

  • Allows bursts

  • Flexible request handling

  • Better for user-driven traffic

Leaky Bucket:

  • Fixed output rate

  • Smooth traffic control

  • Better for steady processing systems

Simple difference:

  • Token Bucket = flexible

  • Leaky Bucket = strict

When to Use Token Bucket vs Leaky Bucket

Use Token Bucket when:

  • You need burst handling

  • APIs receive unpredictable traffic

  • User experience is important

Use Leaky Bucket when:

  • You need steady request flow

  • System must avoid spikes

  • Processing must be consistent

Real-World Use Cases

  • API rate limiting (Token Bucket)

  • Network traffic shaping (Leaky Bucket)

  • Cloud services and gateways

  • Web security systems

Advantages of Rate Limiting Algorithms

  • Protects servers from overload

  • Prevents abuse and attacks

  • Improves reliability

Disadvantages and Challenges

  • May block legitimate users

  • Requires tuning based on traffic

  • Implementation complexity

Real-world mistake:
Setting very strict limits can negatively impact user experience.

Best Practices for Rate Limiting

  • Choose algorithm based on use case

  • Monitor traffic patterns

  • Use dynamic limits

  • Provide proper error responses (429 Too Many Requests)

Sliding Window Algorithm Comparison (Advanced)

The Sliding Window algorithm is another popular rate limiting technique used in modern distributed systems.

Simple idea:

Instead of fixed intervals, it tracks requests over a moving time window.

Example:

  • Limit = 10 requests per minute

  • Instead of resetting every minute, it checks last 60 seconds continuously

How it works:

  • Track timestamps of each request

  • Remove requests older than the time window

  • Count current requests

  • Allow or reject based on limit

Simple understanding:

Sliding Window = more accurate and fair than fixed window or leaky bucket

Comparison:

  • Token Bucket → allows bursts

  • Leaky Bucket → smooth output

  • Sliding Window → accurate and fair control

Use case:

Best for APIs where fairness and precision are important

Distributed Rate Limiting (Redis, API Gateway)

In real-world systems, applications run on multiple servers. Rate limiting must work across all instances.

This is called distributed rate limiting.

How it works:

  • Store request counts in a shared system (like Redis)

  • All servers check and update the same data

Why Redis?

  • Fast in-memory store

  • Supports atomic operations

  • Ideal for counters and rate limiting

Example flow:

  • User sends request

  • Server checks Redis counter

  • If limit not exceeded → allow request

  • Else → reject request (429 error)

API Gateway usage:

Tools like AWS API Gateway or NGINX can enforce rate limits at the edge before requests reach your backend.

Benefits:

  • Centralized control

  • Reduces backend load

  • Improves security

Real-World Architecture for Rate Limiting

In production systems, rate limiting is implemented at multiple layers.

Typical architecture:

  • Client → API Gateway → Application Server → Database

Rate limiting can be applied at:

  1. API Gateway (first layer)

  • Handles most traffic control

  • Example: AWS API Gateway, NGINX

  1. Application Layer

  • Business-level rate limits

  • User-specific rules

  1. Distributed Cache (Redis)

  • Shared counters across servers

Simple flow:

Request → API Gateway checks limit → Redis updates counter → App processes request

Real-world example:

In a public API:

  • Free users → 100 requests/min

  • Premium users → 1000 requests/min

Additional Important Concepts

Fixed Window Algorithm:

  • Simple but less accurate

  • Resets after time window

HTTP Status Code for Rate Limiting:

  • 429 Too Many Requests

Headers used:

  • X-RateLimit-Limit

  • X-RateLimit-Remaining

  • X-RateLimit-Reset

These help clients understand limits.

Summary

Rate limiting algorithms like Token Bucket and Leaky Bucket play a crucial role in controlling traffic and protecting web applications from overload and abuse. While the Token Bucket algorithm allows burst traffic and provides flexibility for dynamic systems like APIs, the Leaky Bucket algorithm ensures a steady and controlled flow of requests, making it suitable for systems that require consistent processing. Choosing the right algorithm depends on your application’s needs, traffic patterns, and performance goals, and implementing rate limiting effectively helps improve security, scalability, and overall system reliability.