What is Rate Limiting in APIs and How to Implement It?

Saurav Kumar
5d
419
0
1

Article

Introduction

In modern web development, APIs (Application Programming Interfaces) are used everywhere—from mobile apps to web applications and cloud services. As the number of users increases, APIs can receive thousands or even millions of requests.

If not controlled properly, too many requests can overload the server, slow down performance, or even crash the system. This is where rate limiting comes into play.

Rate limiting is a technique used to control how many requests a user or system can make to an API within a specific time period.

In this article, we will explain rate limiting in simple terms, explore visual diagrams of core algorithms, look at API Gateway implementations (Azure and AWS), and build a Redis-based distributed solution step by step.

What is Rate Limiting in APIs?

Rate limiting is a method used to restrict the number of API requests a client can make in a given time.

Simple Definition

"You can make only a certain number of requests in a specific time frame."

Example

100 requests per minute
1000 requests per hour

If the limit is exceeded, the API returns HTTP 429 (Too Many Requests).

Why is Rate Limiting Important?

Prevents server overload and downtime
Protects against abuse and DDoS attacks
Ensures fair usage for all users
Improves API performance and stability
Helps control infrastructure cost

How Rate Limiting Works

Step-by-Step Flow

Client sends API request
Server identifies the client (IP/API key/User ID)
Server checks request count in the current window
If within limit → process request
If exceeded → reject with 429 response

Visual Diagram: Token Bucket Algorithm

Flow Representation

[Bucket Capacity: 10 Tokens]
↓
[Tokens added every second]
↓
[Each request consumes 1 token]
↓
[If tokens available → allow request]
[If empty → reject request]

Simple Explanation

Think of a bucket filled with tokens
Every API call uses one token
Tokens refill over time
If bucket is empty → request is blocked

Why Use Token Bucket?

Allows short bursts of traffic
Smoothly controls long-term rate

Visual Diagram: Sliding Window Algorithm

Flow Representation

Time Window → [Last 60 seconds]

Requests:
|----|----|----|----|----|

Count requests in last 60 seconds → Compare with limit

Explanation

Instead of fixed intervals, it checks a rolling window
More accurate than fixed window
Prevents sudden spikes at boundaries

Types of Rate Limiting (Quick Recap)

Fixed Window → Simple but less accurate
Sliding Window → Balanced and accurate
Token Bucket → Best for burst traffic
Leaky Bucket → Smooth constant flow

Implementing Rate Limiting Step by Step

Step 1: Choose Algorithm

Use Token Bucket for flexibility
Use Sliding Window for accuracy

Step 2: Identify Client

IP address
User ID
API key

Step 3: Store Request Data

In-memory (small apps)
Redis (distributed systems)

Step 4: Validate Request

Check count/tokens
Allow or reject

Step 5: Return Response

{
  "error": "Too many requests",
  "status": 429
}

API Gateway Rate Limiting (Azure & AWS)

Azure API Management Example

Azure API Management allows built-in rate limiting using policies.

Example Policy

<rate-limit calls="100" renewal-period="60" />

Explanation

Allows 100 requests per 60 seconds
Applied at API gateway level
No code required

Benefits

Centralized control
Easy configuration
Scalable for enterprise apps

AWS API Gateway Example

AWS API Gateway supports throttling.

Example Settings

Rate: 100 requests per second
Burst: 200 requests

Explanation

Rate = steady traffic
Burst = temporary spike allowance

Benefits

Automatic scaling
Integrated with AWS services

Redis-Based Distributed Rate Limiting

In real-world microservices, multiple servers handle requests. We need a shared storage system like Redis.

Why Use Redis?

Fast (in-memory)
Supports atomic operations
Works across distributed systems

Step-by-Step Implementation (Node.js + Redis)

Step 1: Install Packages

npm install ioredis

Step 2: Connect Redis

const Redis = require('ioredis');
const redis = new Redis();

Step 3: Implement Rate Limiter

async function rateLimiter(key, limit, window) {
  const current = await redis.incr(key);

  if (current === 1) {
    await redis.expire(key, window);
  }

  if (current > limit) {
    return false;
  }

  return true;
}

Step 4: Use in API

app.use(async (req, res, next) => {
  const allowed = await rateLimiter(req.ip, 100, 60);

  if (!allowed) {
    return res.status(429).send("Too many requests");
  }

  next();
});

How It Works

Redis stores request count per user
Counter resets after time window
Works across multiple servers

Real-World Use Cases

Login API

Limit: 5 requests/minute
Prevent brute-force attacks

Public APIs

Free tier vs premium tier limits

Payment APIs

Strict rate limiting for security

Best Practices for Rate Limiting

Use Redis for scalability
Apply different limits for different users
Show rate limit headers
Log rejected requests

Common Mistakes to Avoid

Using same limits for all users
Ignoring burst traffic
Not handling distributed systems

Key Takeaways

Rate limiting protects APIs and servers
Token bucket and sliding window are widely used
API gateways simplify implementation
Redis enables distributed rate limiting

Summary

Rate limiting is a critical technique for building secure, scalable, and high-performance APIs. By using algorithms like token bucket and sliding window, implementing controls at API gateway level, and using Redis for distributed environments, developers can effectively manage traffic, prevent abuse, and ensure smooth user experience. Proper rate limiting not only improves performance but also strengthens API security in modern cloud-based applications.