Designing a Rate Limiter in JavaScript for API Calls

Manav Pandya
Sep 23
743
0
2

Article

Rate limiting is a basic but important part of any system that talks to APIs. It helps protect services from overload, prevents abuse, and ensures fair usage for all clients. This article walks through why rate limiting matters, common algorithms, and practical JavaScript implementations you can drop into a server or client. Code is included so you can try things out right away.

Why rate limiting matters

When many clients hit an API at the same time, the server can slow down or crash. Rate limiting gives you control. On the server side, it prevents a single client from using all resources. On the client side, it helps you avoid being throttled by third-party APIs. In short, rate limiting improves reliability and user experience.

Common rate-limiting strategies

Here are the most common algorithms, explained in plain terms.

Fixed Window. Count requests inside fixed intervals, for example, 100 requests per minute. Easy to implement. The downside is a bursty edge case when many requests come near the boundary of windows.
Sliding Window. A refinement of the fixed window. It smooths out bursts by measuring requests in a rolling time window.
Token Bucket. Tokens are added to a bucket at a fixed rate. A request consumes one token. If tokens are available, the request proceeds; otherwise, it is rejected or queued. This allows bursts up to the bucket capacity.
Leaky Bucket. Requests enter a queue and are processed at a fixed rate. It enforces a steady output rate and smooths incoming bursts.

Which one to pick depends on your needs. For most API client uses, the token bucket gives a flexible and predictable behavior. For server-side enforcement with strict fairness, a sliding window or a leaky bucket is a good option.

Below, we will implement a compact token bucket in JavaScript, then show an Express middleware and a client-side fetch wrapper that use it.

Token bucket implementation in JavaScript

This implementation is in-memory and single-process. For distributed systems, you would use Redis or another shared store. The code is intentionally simple so you can understand the mechanics.

// tokenBucket.js
class TokenBucket {
  constructor({capacity, refillTokens, refillIntervalMs}) {
    this.capacity = capacity; // max tokens in bucket
    this.tokens = capacity; // current tokens
    this.refillTokens = refillTokens; // tokens to add each interval
    this.refillIntervalMs = refillIntervalMs; // interval in ms
    this.lastRefill = Date.now();
  }

  refill() {
    const now = Date.now();
    const elapsed = now - this.lastRefill;
    if (elapsed <= 0) return;
    const intervalsPassed = Math.floor(elapsed / this.refillIntervalMs);
    if (intervalsPassed > 0) {
      const newTokens = intervalsPassed * this.refillTokens;
      this.tokens = Math.min(this.capacity, this.tokens + newTokens);
      this.lastRefill += intervalsPassed * this.refillIntervalMs;
    }
  }

  tryRemoveTokens(count = 1) {
    this.refill();
    if (this.tokens >= count) {
      this.tokens -= count;
      return true;
    }
    return false;
  }

  getTokens() {
    this.refill();
    return this.tokens;
  }
}

module.exports = TokenBucket;

Usage note: capacity is the maximum burst allowed. refillTokens and refillIntervalMs control steady state throughput. For example, to allow 10 requests per second with a burst up to 20, set capacity = 20, refillTokens = 10, and refillIntervalMs = 1000.

Server-side example - Express middleware

Below is a simple Express middleware that limits each client by IP. It creates a token bucket for each IP and rejects requests that exceed the bucket.

// rateLimitMiddleware.js
const TokenBucket = require('./tokenBucket');

const buckets = new Map();

function getBucketForIp(ip) {
  if (!buckets.has(ip)) {
    // capacity 20, refill 10 tokens every second
    buckets.set(ip, new TokenBucket({
      capacity: 20,
      refillTokens: 10,
      refillIntervalMs: 1000
    }));
  }
  return buckets.get(ip);
}

function rateLimitMiddleware(req, res, next) {
  const ip = req.ip || req.connection.remoteAddress;
  const bucket = getBucketForIp(ip);
  if (bucket.tryRemoveTokens(1)) {
    return next();
  }

  res.status(429).json({
    error: 'Too many requests',
    retryAfterMs: 1000 // hint for clients
  });
}

module.exports = rateLimitMiddleware;

Place this middleware near the top of your middleware stack. For production use, you should persist buckets in Redis or similar so limits are shared across instances. Also, add a periodic cleanup routine to remove old entries from buckets to avoid memory growth.

Client-side example - fetch wrapper

If you are calling a third-party API that enforces a rate limit, you can build a client wrapper that delays requests when tokens are not available. Here is a small wrapper that either queues or rejects requests.

// fetchRateLimited.js
const TokenBucket = require('./tokenBucket');
const bucket = new TokenBucket({capacity: 5, refillTokens: 5, refillIntervalMs: 1000}); // 5 req/s

function waitForToken(timeoutMs = 5000) {
  const start = Date.now();
  return new Promise((resolve, reject) => {
    function attempt() {
      if (bucket.tryRemoveTokens(1)) return resolve();
      if (Date.now() - start > timeoutMs) return reject(new Error('Rate limit wait timeout'));
      // try again after a short delay
      setTimeout(attempt, 50);
    }
    attempt();
  });
}

async function rateLimitedFetch(url, options = {}) {
  await waitForToken();
  return fetch(url, options);
}

module.exports = rateLimitedFetch;

This wrapper blocks until a token is available or until the timeout expires. It is suitable for clients who can tolerate a small amount of waiting. For low-latency needs, consider rejecting quickly and letting the caller retry with a backoff.

Testing your rate limiter

Automated tests help ensure the limiter behaves well under bursts.

Write unit tests for the token refill logic. Simulate time progression by mocking Date.now.
Test the middleware by firing many concurrent requests using a load tool or a script. Verify that the number of responses with status 200 versus 429 matches expectations.
Test edge cases, such as when the server restarts or when buckets are removed from memory.

Here is a quick test script you can run against the fetch wrapper to see it in action.

// testClient.js
const rateLimitedFetch = require('./fetchRateLimited');

async function fireMany(n) {
  const promises = Array.from({length: n}).map(async (_, i) => {
    try {
      // replace with a real URL or a local mock server
      const res = await rateLimitedFetch('https://httpbin.org/get');
      console.log(i, 'ok', res.status);
    } catch (err) {
      console.log(i, 'err', err.message);
    }
  });
  await Promise.all(promises);
}

fireMany(20);

Run the script and observe how requests are paced instead of all hitting at once.

Considerations and trade-offs

Single process vs distributed. In-memory limiters are fast but only work per process. Use Redis or a shared datastore for multi-instance deployments.
Accuracy vs performance. A sliding window gives more accurate limits but is heavier. The token bucket is a good balance.
Persistence. Decide whether to persist counters. Persistent stores can survive restarts.
Client hints. Include headers like X-RateLimit-Limit, X-RateLimit-Remaining, and Retry-After So clients can behave gracefully.
Backoff strategy. When a request is rejected, the client should retry with exponential backoff and jitter to avoid creating synchronized bursts.
State cleanup. Remove inactive keys to avoid memory leaks. For example, attach a last-seen timestamp and prune periodically.
Security. Rate limiting by IP is simple but can be bypassed by proxies. For authenticated APIs, prefer user ID or API key-based limits.

When to be stricter or more lenient

If your endpoints are critical or expensive, apply stricter limits. If you want a better user experience for paid customers, consider tiered limits based on plan. Also consider allowing higher burst capacity for short periods while enforcing a lower sustained rate.

Conclusion

Rate limiting protects both servers and clients. The token bucket algorithm is flexible and easy to implement. The examples provided here serve as a starting point for both server and client usage. For production systems, move the state to a shared store and add monitoring so you can track when limits are reached and adjust parameters accordingly.