Web API  

What is Rate Limiting in APIs and How to Implement It?

Introduction

In modern web development, APIs (Application Programming Interfaces) are used everywhere—from mobile apps to web applications and cloud services. As the number of users increases, APIs can receive thousands or even millions of requests.

If not controlled properly, too many requests can overload the server, slow down performance, or even crash the system. This is where rate limiting comes into play.

Rate limiting is a technique used to control how many requests a user or system can make to an API within a specific time period.

In this article, we will explain rate limiting in simple terms, explore visual diagrams of core algorithms, look at API Gateway implementations (Azure and AWS), and build a Redis-based distributed solution step by step.

What is Rate Limiting in APIs?

Rate limiting is a method used to restrict the number of API requests a client can make in a given time.

Simple Definition

"You can make only a certain number of requests in a specific time frame."

Example

  • 100 requests per minute

  • 1000 requests per hour

If the limit is exceeded, the API returns HTTP 429 (Too Many Requests).

Why is Rate Limiting Important?

  • Prevents server overload and downtime

  • Protects against abuse and DDoS attacks

  • Ensures fair usage for all users

  • Improves API performance and stability

  • Helps control infrastructure cost

How Rate Limiting Works

Step-by-Step Flow

  1. Client sends API request

  2. Server identifies the client (IP/API key/User ID)

  3. Server checks request count in the current window

  4. If within limit → process request

  5. If exceeded → reject with 429 response

Visual Diagram: Token Bucket Algorithm

Flow Representation

[Bucket Capacity: 10 Tokens]

[Tokens added every second]

[Each request consumes 1 token]

[If tokens available → allow request]
[If empty → reject request]

Simple Explanation

  • Think of a bucket filled with tokens

  • Every API call uses one token

  • Tokens refill over time

  • If bucket is empty → request is blocked

Why Use Token Bucket?

  • Allows short bursts of traffic

  • Smoothly controls long-term rate

Visual Diagram: Sliding Window Algorithm

Flow Representation

Time Window → [Last 60 seconds]

Requests:
|----|----|----|----|----|

Count requests in last 60 seconds → Compare with limit

Explanation

  • Instead of fixed intervals, it checks a rolling window

  • More accurate than fixed window

  • Prevents sudden spikes at boundaries

Types of Rate Limiting (Quick Recap)

  • Fixed Window → Simple but less accurate

  • Sliding Window → Balanced and accurate

  • Token Bucket → Best for burst traffic

  • Leaky Bucket → Smooth constant flow

Implementing Rate Limiting Step by Step

Step 1: Choose Algorithm

  • Use Token Bucket for flexibility

  • Use Sliding Window for accuracy

Step 2: Identify Client

  • IP address

  • User ID

  • API key

Step 3: Store Request Data

  • In-memory (small apps)

  • Redis (distributed systems)

Step 4: Validate Request

  • Check count/tokens

  • Allow or reject

Step 5: Return Response

{
  "error": "Too many requests",
  "status": 429
}

API Gateway Rate Limiting (Azure & AWS)

Azure API Management Example

Azure API Management allows built-in rate limiting using policies.

Example Policy

<rate-limit calls="100" renewal-period="60" />

Explanation

  • Allows 100 requests per 60 seconds

  • Applied at API gateway level

  • No code required

Benefits

  • Centralized control

  • Easy configuration

  • Scalable for enterprise apps

AWS API Gateway Example

AWS API Gateway supports throttling.

Example Settings

  • Rate: 100 requests per second

  • Burst: 200 requests

Explanation

  • Rate = steady traffic

  • Burst = temporary spike allowance

Benefits

  • Automatic scaling

  • Integrated with AWS services

Redis-Based Distributed Rate Limiting

In real-world microservices, multiple servers handle requests. We need a shared storage system like Redis.

Why Use Redis?

  • Fast (in-memory)

  • Supports atomic operations

  • Works across distributed systems

Step-by-Step Implementation (Node.js + Redis)

Step 1: Install Packages

npm install ioredis

Step 2: Connect Redis

const Redis = require('ioredis');
const redis = new Redis();

Step 3: Implement Rate Limiter

async function rateLimiter(key, limit, window) {
  const current = await redis.incr(key);

  if (current === 1) {
    await redis.expire(key, window);
  }

  if (current > limit) {
    return false;
  }

  return true;
}

Step 4: Use in API

app.use(async (req, res, next) => {
  const allowed = await rateLimiter(req.ip, 100, 60);

  if (!allowed) {
    return res.status(429).send("Too many requests");
  }

  next();
});

How It Works

  • Redis stores request count per user

  • Counter resets after time window

  • Works across multiple servers

Real-World Use Cases

Login API

  • Limit: 5 requests/minute

  • Prevent brute-force attacks

Public APIs

  • Free tier vs premium tier limits

Payment APIs

  • Strict rate limiting for security

Best Practices for Rate Limiting

  • Use Redis for scalability

  • Apply different limits for different users

  • Show rate limit headers

  • Log rejected requests

Common Mistakes to Avoid

  • Using same limits for all users

  • Ignoring burst traffic

  • Not handling distributed systems

Key Takeaways

  • Rate limiting protects APIs and servers

  • Token bucket and sliding window are widely used

  • API gateways simplify implementation

  • Redis enables distributed rate limiting

Summary

Rate limiting is a critical technique for building secure, scalable, and high-performance APIs. By using algorithms like token bucket and sliding window, implementing controls at API gateway level, and using Redis for distributed environments, developers can effectively manage traffic, prevent abuse, and ensure smooth user experience. Proper rate limiting not only improves performance but also strengthens API security in modern cloud-based applications.