Rate limiting is a basic but important part of any system that talks to APIs. It helps protect services from overload, prevents abuse, and ensures fair usage for all clients. This article walks through why rate limiting matters, common algorithms, and practical JavaScript implementations you can drop into a server or client. Code is included so you can try things out right away.
Why rate limiting matters
When many clients hit an API at the same time, the server can slow down or crash. Rate limiting gives you control. On the server side, it prevents a single client from using all resources. On the client side, it helps you avoid being throttled by third-party APIs. In short, rate limiting improves reliability and user experience.
Common rate-limiting strategies
Here are the most common algorithms, explained in plain terms.
Fixed Window. Count requests inside fixed intervals, for example, 100 requests per minute. Easy to implement. The downside is a bursty edge case when many requests come near the boundary of windows.
Sliding Window. A refinement of the fixed window. It smooths out bursts by measuring requests in a rolling time window.
Token Bucket. Tokens are added to a bucket at a fixed rate. A request consumes one token. If tokens are available, the request proceeds; otherwise, it is rejected or queued. This allows bursts up to the bucket capacity.
Leaky Bucket. Requests enter a queue and are processed at a fixed rate. It enforces a steady output rate and smooths incoming bursts.
Which one to pick depends on your needs. For most API client uses, the token bucket gives a flexible and predictable behavior. For server-side enforcement with strict fairness, a sliding window or a leaky bucket is a good option.
Below, we will implement a compact token bucket in JavaScript, then show an Express middleware and a client-side fetch wrapper that use it.
Token bucket implementation in JavaScript
This implementation is in-memory and single-process. For distributed systems, you would use Redis or another shared store. The code is intentionally simple so you can understand the mechanics.
// tokenBucket.js
class TokenBucket {
constructor({capacity, refillTokens, refillIntervalMs}) {
this.capacity = capacity; // max tokens in bucket
this.tokens = capacity; // current tokens
this.refillTokens = refillTokens; // tokens to add each interval
this.refillIntervalMs = refillIntervalMs; // interval in ms
this.lastRefill = Date.now();
}
refill() {
const now = Date.now();
const elapsed = now - this.lastRefill;
if (elapsed <= 0) return;
const intervalsPassed = Math.floor(elapsed / this.refillIntervalMs);
if (intervalsPassed > 0) {
const newTokens = intervalsPassed * this.refillTokens;
this.tokens = Math.min(this.capacity, this.tokens + newTokens);
this.lastRefill += intervalsPassed * this.refillIntervalMs;
}
}
tryRemoveTokens(count = 1) {
this.refill();
if (this.tokens >= count) {
this.tokens -= count;
return true;
}
return false;
}
getTokens() {
this.refill();
return this.tokens;
}
}
module.exports = TokenBucket;
Usage note: capacity
is the maximum burst allowed. refillTokens
and refillIntervalMs
control steady state throughput. For example, to allow 10 requests per second with a burst up to 20, set capacity = 20
, refillTokens = 10
, and refillIntervalMs = 1000
.
Server-side example - Express middleware
Below is a simple Express middleware that limits each client by IP. It creates a token bucket for each IP and rejects requests that exceed the bucket.
// rateLimitMiddleware.js
const TokenBucket = require('./tokenBucket');
const buckets = new Map();
function getBucketForIp(ip) {
if (!buckets.has(ip)) {
// capacity 20, refill 10 tokens every second
buckets.set(ip, new TokenBucket({
capacity: 20,
refillTokens: 10,
refillIntervalMs: 1000
}));
}
return buckets.get(ip);
}
function rateLimitMiddleware(req, res, next) {
const ip = req.ip || req.connection.remoteAddress;
const bucket = getBucketForIp(ip);
if (bucket.tryRemoveTokens(1)) {
return next();
}
res.status(429).json({
error: 'Too many requests',
retryAfterMs: 1000 // hint for clients
});
}
module.exports = rateLimitMiddleware;
Place this middleware near the top of your middleware stack. For production use, you should persist buckets in Redis or similar so limits are shared across instances. Also, add a periodic cleanup routine to remove old entries from buckets
to avoid memory growth.
Client-side example - fetch wrapper
If you are calling a third-party API that enforces a rate limit, you can build a client wrapper that delays requests when tokens are not available. Here is a small wrapper that either queues or rejects requests.
// fetchRateLimited.js
const TokenBucket = require('./tokenBucket');
const bucket = new TokenBucket({capacity: 5, refillTokens: 5, refillIntervalMs: 1000}); // 5 req/s
function waitForToken(timeoutMs = 5000) {
const start = Date.now();
return new Promise((resolve, reject) => {
function attempt() {
if (bucket.tryRemoveTokens(1)) return resolve();
if (Date.now() - start > timeoutMs) return reject(new Error('Rate limit wait timeout'));
// try again after a short delay
setTimeout(attempt, 50);
}
attempt();
});
}
async function rateLimitedFetch(url, options = {}) {
await waitForToken();
return fetch(url, options);
}
module.exports = rateLimitedFetch;
This wrapper blocks until a token is available or until the timeout expires. It is suitable for clients who can tolerate a small amount of waiting. For low-latency needs, consider rejecting quickly and letting the caller retry with a backoff.
Testing your rate limiter
Automated tests help ensure the limiter behaves well under bursts.
Write unit tests for the token refill logic. Simulate time progression by mocking Date.now
.
Test the middleware by firing many concurrent requests using a load tool or a script. Verify that the number of responses with status 200 versus 429 matches expectations.
Test edge cases, such as when the server restarts or when buckets are removed from memory.
Here is a quick test script you can run against the fetch wrapper to see it in action.
// testClient.js
const rateLimitedFetch = require('./fetchRateLimited');
async function fireMany(n) {
const promises = Array.from({length: n}).map(async (_, i) => {
try {
// replace with a real URL or a local mock server
const res = await rateLimitedFetch('https://httpbin.org/get');
console.log(i, 'ok', res.status);
} catch (err) {
console.log(i, 'err', err.message);
}
});
await Promise.all(promises);
}
fireMany(20);
Run the script and observe how requests are paced instead of all hitting at once.
Considerations and trade-offs
Single process vs distributed. In-memory limiters are fast but only work per process. Use Redis or a shared datastore for multi-instance deployments.
Accuracy vs performance. A sliding window gives more accurate limits but is heavier. The token bucket is a good balance.
Persistence. Decide whether to persist counters. Persistent stores can survive restarts.
Client hints. Include headers like X-RateLimit-Limit
, X-RateLimit-Remaining
, and Retry-After
So clients can behave gracefully.
Backoff strategy. When a request is rejected, the client should retry with exponential backoff and jitter to avoid creating synchronized bursts.
State cleanup. Remove inactive keys to avoid memory leaks. For example, attach a last-seen timestamp and prune periodically.
Security. Rate limiting by IP is simple but can be bypassed by proxies. For authenticated APIs, prefer user ID or API key-based limits.
When to be stricter or more lenient
If your endpoints are critical or expensive, apply stricter limits. If you want a better user experience for paid customers, consider tiered limits based on plan. Also consider allowing higher burst capacity for short periods while enforcing a lower sustained rate.
Conclusion
Rate limiting protects both servers and clients. The token bucket algorithm is flexible and easy to implement. The examples provided here serve as a starting point for both server and client usage. For production systems, move the state to a shared store and add monitoring so you can track when limits are reached and adjust parameters accordingly.