Introduction
In modern web development, APIs (Application Programming Interfaces) are used everywhere—from mobile apps to web applications and cloud services. As the number of users increases, APIs can receive thousands or even millions of requests.
If not controlled properly, too many requests can overload the server, slow down performance, or even crash the system. This is where rate limiting comes into play.
Rate limiting is a technique used to control how many requests a user or system can make to an API within a specific time period.
In this article, we will explain rate limiting in simple terms, explore visual diagrams of core algorithms, look at API Gateway implementations (Azure and AWS), and build a Redis-based distributed solution step by step.
What is Rate Limiting in APIs?
Rate limiting is a method used to restrict the number of API requests a client can make in a given time.
Simple Definition
"You can make only a certain number of requests in a specific time frame."
Example
100 requests per minute
1000 requests per hour
If the limit is exceeded, the API returns HTTP 429 (Too Many Requests).
Why is Rate Limiting Important?
Prevents server overload and downtime
Protects against abuse and DDoS attacks
Ensures fair usage for all users
Improves API performance and stability
Helps control infrastructure cost
How Rate Limiting Works
Step-by-Step Flow
Client sends API request
Server identifies the client (IP/API key/User ID)
Server checks request count in the current window
If within limit → process request
If exceeded → reject with 429 response
Visual Diagram: Token Bucket Algorithm
Flow Representation
[Bucket Capacity: 10 Tokens]
↓
[Tokens added every second]
↓
[Each request consumes 1 token]
↓
[If tokens available → allow request]
[If empty → reject request]
Simple Explanation
Think of a bucket filled with tokens
Every API call uses one token
Tokens refill over time
If bucket is empty → request is blocked
Why Use Token Bucket?
Visual Diagram: Sliding Window Algorithm
Flow Representation
Time Window → [Last 60 seconds]
Requests:
|----|----|----|----|----|
Count requests in last 60 seconds → Compare with limit
Explanation
Instead of fixed intervals, it checks a rolling window
More accurate than fixed window
Prevents sudden spikes at boundaries
Types of Rate Limiting (Quick Recap)
Fixed Window → Simple but less accurate
Sliding Window → Balanced and accurate
Token Bucket → Best for burst traffic
Leaky Bucket → Smooth constant flow
Implementing Rate Limiting Step by Step
Step 1: Choose Algorithm
Step 2: Identify Client
Step 3: Store Request Data
Step 4: Validate Request
Check count/tokens
Allow or reject
Step 5: Return Response
{
"error": "Too many requests",
"status": 429
}
API Gateway Rate Limiting (Azure & AWS)
Azure API Management Example
Azure API Management allows built-in rate limiting using policies.
Example Policy
<rate-limit calls="100" renewal-period="60" />
Explanation
Benefits
AWS API Gateway Example
AWS API Gateway supports throttling.
Example Settings
Explanation
Benefits
Redis-Based Distributed Rate Limiting
In real-world microservices, multiple servers handle requests. We need a shared storage system like Redis.
Why Use Redis?
Step-by-Step Implementation (Node.js + Redis)
Step 1: Install Packages
npm install ioredis
Step 2: Connect Redis
const Redis = require('ioredis');
const redis = new Redis();
Step 3: Implement Rate Limiter
async function rateLimiter(key, limit, window) {
const current = await redis.incr(key);
if (current === 1) {
await redis.expire(key, window);
}
if (current > limit) {
return false;
}
return true;
}
Step 4: Use in API
app.use(async (req, res, next) => {
const allowed = await rateLimiter(req.ip, 100, 60);
if (!allowed) {
return res.status(429).send("Too many requests");
}
next();
});
How It Works
Redis stores request count per user
Counter resets after time window
Works across multiple servers
Real-World Use Cases
Login API
Public APIs
Payment APIs
Best Practices for Rate Limiting
Common Mistakes to Avoid
Key Takeaways
Rate limiting protects APIs and servers
Token bucket and sliding window are widely used
API gateways simplify implementation
Redis enables distributed rate limiting
Summary
Rate limiting is a critical technique for building secure, scalable, and high-performance APIs. By using algorithms like token bucket and sliding window, implementing controls at API gateway level, and using Redis for distributed environments, developers can effectively manage traffic, prevent abuse, and ensure smooth user experience. Proper rate limiting not only improves performance but also strengthens API security in modern cloud-based applications.