Node.js  

Node.js API Rate Limiting Explained: Token Bucket & Leaky Bucket Techniques

🚀 Introduction

Rate limiting protects your API from abuse and smooths out spikes by controlling how many requests a client can make in a time period. Without it, a noisy neighbor (or a bug) can overwhelm your server, increase costs, and degrade experience for everyone. In Node.js, you typically add rate limiting as Express middleware and choose an algorithm that fits your traffic patterns.

🤔 Why Rate Limit? (Simple Words)

  • Fairness: Prevent one user from hogging resources.
  • Stability: Avoid sudden traffic spikes that crash servers.
  • Security: Mitigate brute‑force login attempts and scraping.
  • Cost Control: Keep bandwidth and compute costs predictable.

🧠 Core Ideas You’ll Use

  • Identity (the key): How you group requests (e.g., by IP, API key, user ID).
  • Allowance: How many requests are allowed per window or per second.
  • Storage: Where you remember counts/tokens (in‑memory for a single instance; Redis for a cluster).
  • Backoff/Signals: How the client should slow down (HTTP 429 + headers like Retry-After).

🧮 Algorithm Overview (When to Use What)

  • Fixed Window Counter: Simple. “100 requests every 60s.” Can burst at window edges.
  • Sliding Window (Log or Rolling): Smoother than fixed. More accurate but heavier.
  • Token Bucket: Allows short bursts but enforces an average rate. Great for user‑facing APIs.
  • Leaky Bucket (Queue/Drip): Smooth, constant outflow; good when you must strictly pace downstream systems.

🧱 Baseline: Fixed Window Counter (In‑Memory)

Good as a learning step or for single‑process dev environments.

// middleware/fixedWindowLimiter.js
const WINDOW_MS = 60_000; // 60 seconds
const MAX_REQUESTS = 100; // per window per key

const store = new Map(); // key -> { count, windowStart }

function getKey(req) {
  return req.ip; // or req.headers['x-api-key'], req.user.id, etc.
}

module.exports = function fixedWindowLimiter(req, res, next) {
  const key = getKey(req);
  const now = Date.now();
  const entry = store.get(key) || { count: 0, windowStart: now };

  if (now - entry.windowStart >= WINDOW_MS) {
    entry.count = 0;
    entry.windowStart = now;
  }

  entry.count += 1;
  store.set(key, entry);

  const remaining = Math.max(0, MAX_REQUESTS - entry.count);
  res.setHeader('X-RateLimit-Limit', MAX_REQUESTS);
  res.setHeader('X-RateLimit-Remaining', Math.max(0, remaining));
  res.setHeader('X-RateLimit-Reset', Math.ceil((entry.windowStart + WINDOW_MS) / 1000));

  if (entry.count > MAX_REQUESTS) {
    res.setHeader('Retry-After', Math.ceil((entry.windowStart + WINDOW_MS - now) / 1000));
    return res.status(429).json({ error: 'Too Many Requests' });
  }

  next();
};

🪙 Token Bucket (Burst‑friendly Average Rate)

How it works: You have a bucket that slowly refills with tokens (e.g., 5 tokens/second) up to a max capacity (burst). Each request consumes a token. No tokens? The request is limited.

// middleware/tokenBucketLimiter.js
const RATE_PER_SEC = 5;      // refill speed
const BURST_CAPACITY = 20;   // max tokens

const buckets = new Map();   // key -> { tokens, lastRefill }

function getKey(req) { return req.ip; }

module.exports = function tokenBucketLimiter(req, res, next) {
  const key = getKey(req);
  const now = Date.now();
  let bucket = buckets.get(key);
  if (!bucket) {
    bucket = { tokens: BURST_CAPACITY, lastRefill: now };
    buckets.set(key, bucket);
  }

  // Refill based on elapsed time
  const elapsedSec = (now - bucket.lastRefill) / 1000;
  bucket.tokens = Math.min(BURST_CAPACITY, bucket.tokens + elapsedSec * RATE_PER_SEC);
  bucket.lastRefill = now;

  if (bucket.tokens >= 1) {
    bucket.tokens -= 1; // consume for this request
    res.setHeader('X-RateLimit-Policy', `${RATE_PER_SEC}/sec; burst=${BURST_CAPACITY}`);
    res.setHeader('X-RateLimit-Tokens', Math.floor(bucket.tokens));
    return next();
  }

  const needed = 1 - bucket.tokens;
  const waitSeconds = needed / RATE_PER_SEC;
  res.setHeader('Retry-After', Math.ceil(waitSeconds));
  return res.status(429).json({ error: 'Too Many Requests' });
};

When to use: You want to permit quick bursts (nice UX) but keep a sustained average.

🪣 Leaky Bucket (Constant Outflow) 

How it works: Requests enter a queue (the bucket). They “leak” at a fixed rate. If the bucket is full, you reject or drop new requests.

// middleware/leakyBucketLimiter.js
const LEAK_RATE_PER_SEC = 5;    // how many requests per second can pass
const BUCKET_CAPACITY = 50;     // max queued requests

const buckets = new Map();      // key -> { queue, lastLeak }

function getKey(req) { return req.ip; }

module.exports = function leakyBucketLimiter(req, res, next) {
  const key = getKey(req);
  const now = Date.now();
  let bucket = buckets.get(key);
  if (!bucket) {
    bucket = { queue: 0, lastLeak: now };
    buckets.set(key, bucket);
  }

  // Leak based on elapsed time
  const elapsedSec = (now - bucket.lastLeak) / 1000;
  const leaked = Math.floor(elapsedSec * LEAK_RATE_PER_SEC);
  if (leaked > 0) {
    bucket.queue = Math.max(0, bucket.queue - leaked);
    bucket.lastLeak = now;
  }

  if (bucket.queue >= BUCKET_CAPACITY) {
    res.setHeader('Retry-After', 1);
    return res.status(429).json({ error: 'Too Many Requests (bucket full)' });
  }

  bucket.queue += 1; // enqueue this request
  // In practice, you would defer processing; for middleware demo we let it pass immediately
  next();
};

When to use: You must strictly pace downstream dependencies (e.g., payment gateway rate caps).

🧩 Wiring It Up in Express

// server.js
const express = require('express');
const fixedWindowLimiter = require('./middleware/fixedWindowLimiter');
const tokenBucketLimiter = require('./middleware/tokenBucketLimiter');
// const leakyBucketLimiter = require('./middleware/leakyBucketLimiter');

const app = express();

// Example: apply global limiter
app.use(tokenBucketLimiter);

// Or apply per‑route
app.get('/public', fixedWindowLimiter, (req, res) => res.send('ok'));
app.get('/payments', /* leakyBucketLimiter, */ (req, res) => res.send('paid'));

app.listen(3000, () => console.log('API on :3000'));

🧰 Production‑Ready Storage with Redis

In clustered or serverless environments, in‑memory maps don’t work across instances. Use a shared store like Redis to coordinate limits.

// middleware/redisTokenBucket.js
const IORedis = require('ioredis');
const redis = new IORedis(process.env.REDIS_URL);

const RATE_PER_SEC = 10;
const BURST_CAPACITY = 40;

function keyFor(clientKey) { return `rl:tb:${clientKey}`; }

module.exports = async function redisTokenBucket(req, res, next) {
  try {
    const clientKey = req.ip; // replace with API key or user id in real apps
    const now = Date.now();
    const k = keyFor(clientKey);

    // Read bucket state
    const data = await redis.hmget(k, 'tokens', 'lastRefill');
    let tokens = parseFloat(data[0]);
    let lastRefill = parseInt(data[1], 10);

    if (Number.isNaN(tokens)) tokens = BURST_CAPACITY;
    if (Number.isNaN(lastRefill)) lastRefill = now;

    const elapsedSec = (now - lastRefill) / 1000;
    tokens = Math.min(BURST_CAPACITY, tokens + elapsedSec * RATE_PER_SEC);

    if (tokens >= 1) {
      tokens -= 1;
      await redis.hmset(k, 'tokens', tokens, 'lastRefill', now);
      await redis.expire(k, Math.ceil(BURST_CAPACITY / RATE_PER_SEC) + 60);
      res.setHeader('X-RateLimit-Policy', `${RATE_PER_SEC}/sec; burst=${BURST_CAPACITY}`);
      res.setHeader('X-RateLimit-Tokens', Math.floor(tokens));
      return next();
    }

    const needed = 1 - tokens;
    const waitSeconds = needed / RATE_PER_SEC;
    res.setHeader('Retry-After', Math.ceil(waitSeconds));
    return res.status(429).json({ error: 'Too Many Requests' });
  } catch (err) {
    // Fail‑open or fail‑closed? Choose policy. Here we fail‑open so API stays usable.
    console.error('Rate limiter error', err);
    next();
  }
};

🧪 Testing Your Limiter (Quick Ideas)

  • Unit tests: Simulate timestamps and assert counters/tokens.
  • Load tests: Use autocannon or k6 to verify 429 rates, latencies, and headers.
  • Chaos tests: Kill Redis or introduce latency—does your API fail open or closed?

🧾 Helpful HTTP Headers

Return clear metadata so clients can self‑throttle:

  • X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
  • Retry-After on 429
  • (Optional, standardized) RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset

✅ Best Practices & Tips

  • Choose the key wisely: Prefer API key/user ID over raw IP (NATs/proxies share IPs).
  • Protect sensitive routes more: e.g., logins: 5/min per user + per IP.
  • Combine with caching & auth: Rate limit after auth to identify the true principal.
  • Use Redis for scale: In‑memory only works on a single instance.
  • Expose headers & docs: Tell clients how to back off.
  • Observe: Log 429s, export metrics (Prometheus) and set alerts.
  • Legal & UX: Don’t silently drop; return 429 with guidance.

🧭 Choosing an Algorithm (Cheat Sheet)

  • Public API with bursts OK: Token Bucket
  • Strict pacing to external vendor: Leaky Bucket
  • Simple per‑minute cap: Fixed/Sliding Window
  • High accuracy under spiky traffic: Sliding Window (rolling)

📌 Summary

Rate limiting is essential for reliable Node.js APIs. Start by defining who you limit (key), how much (policy), and where you store state (Redis for multi‑instance). Pick an algorithm that matches your needs: fixed/sliding windows for simplicity, a token bucket for burst‑friendly average rates, or a leaky bucket for steady pacing. Implement as Express middleware, return helpful headers, test under load, and monitor 429s. With these patterns, your API stays fast, fair, and resilient—even during traffic spikes.