Rate limiting (a.k.a API throttling) protects your APIs from abuse, prevents overload, and enforces fair usage. In ASP.NET Core you can implement it at many levels: per-IP, per-user, per-endpoint, and globally — using built-in middleware, third-party libraries, or a distributed store (Redis) for multi-instance scenarios.
This guide explains practical strategies, shows ready-to-drop-in code (modern ASP.NET Core), and covers production concerns: headers, retry hints, metrics, and pattern recommendations.
1. Goals & common policies
Before implementation, decide what you want to protect against. Typical goals:
Prevent too many requests from a single client (DoS protection).
Enforce fair API quotas for paid/free tiers (business rules).
Smooth bursts while allowing sustained throughput (token bucket).
Rate limit across multiple app instances (distributed counters).
Common rate limit strategies
Fixed window: count requests per fixed interval (easy, bursty).
Sliding window: smoother than fixed window (slightly more complex).
Leaky bucket / Token bucket: allow bursts up to capacity, then steady rate.
Concurrency limit: limit concurrent active operations (server load control).
Request cost: weight endpoints by cost (heavy endpoints use more tokens).
2. Built-in Rate Limiting (ASP.NET Core 7+)
.NET provides Microsoft.AspNetCore.RateLimiting (and System.Threading.RateLimiting) with RateLimiter, TokenBucketRateLimiter, and PartitionedRateLimiter. It integrates as middleware and is the recommended starting point for single-instance apps or when using a distributed custom partitioner.
2.1 Simple Token Bucket per IP (example)
// Program.cs (minimal API / .NET 7+)
using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;
var builder = WebApplication.CreateBuilder(args);
// Register rate limiting
builder.Services.AddRateLimiter(options =>
{
// Partition by remote IP
options.AddPolicy("PerIpPolicy", context =>
RateLimitPartition.GetRemoteIpAddressPolicy(context, ip =>
new TokenBucketRateLimiterOptions
{
// allow 10 tokens initially, refill 1 token per second, capacity 10
TokenLimit = 10,
QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
QueueLimit = 0,
ReplenishmentPeriod = TimeSpan.FromSeconds(1),
TokensPerPeriod = 1
}));
});
var app = builder.Build();
// Use middleware
app.UseRateLimiter();
// Example endpoint with policy
app.MapGet("/api/data", () => Results.Ok("OK"))
.RequireRateLimiting("PerIpPolicy");
app.Run();
// Helper extension
public static class RateLimitPartition
{
public static Func<HttpContext, string> GetRemoteIpAddressPolicy = context =>
{
var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
return ip;
};
public static PartitionedRateLimiter<HttpContext, string> GetRemoteIpAddressPolicy(
HttpContext context, Func<string, TokenBucketRateLimiterOptions> optionsFactory) =>
PartitionedRateLimiter.Create<HttpContext, string>(context =>
RateLimitPartition.GetRemoteIpAddressPolicy(context) is string key
? RateLimitPartition.GetPartitionForKey(key, optionsFactory(key))
: RateLimitPartition.GetPartitionForKey("unknown", optionsFactory("unknown")));
// ... implement GetPartitionForKey if needed or use built-in helper above in AddPolicy
}
Notes
2.2 Per-User or Per-API Key Partitioning
If your API authenticates clients, partition by user id or API key instead of IP:
options.AddPolicy("PerUserPolicy", context =>
{
var userId = context.User?.FindFirst("sub")?.Value ?? "anon";
return RateLimitPartition.GetFixedWindowLimiter(userId, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
QueueLimit = 0
});
});
3. Distributed Rate Limiting (multi-instance) — Redis approach
In multi-instance scenarios (Kubernetes, multiple app servers), in-memory counters don't work. Use a distributed store such as Redis.
Two practical approaches:
Use a library that implements distributed rate limiting (e.g., [AspNetCoreRateLimit] is popular but mainly in-memory; other libraries exist).
Implement a simple Redis-backed fixed-window or token-bucket using atomic Redis commands (INCR + EXPIRE or Lua script) and custom middleware.
3.1 Redis fixed-window example (middleware)
Pros: simple, works well for quotas.
Cons: fixed-window allows all burst at boundary; sliding window or leaky bucket with Lua is better for smoothing.
// RedisRateLimitMiddleware.cs
using StackExchange.Redis;
using Microsoft.AspNetCore.Http;
using System.Text.Json;
public class RedisRateLimitMiddleware
{
private readonly RequestDelegate _next;
private readonly IDatabase _redis;
private readonly int _limit;
private readonly TimeSpan _window;
public RedisRateLimitMiddleware(RequestDelegate next, IConnectionMultiplexer multiplexer, int limit = 100, TimeSpan? window = null)
{
_next = next;
_redis = multiplexer.GetDatabase();
_limit = limit;
_window = window ?? TimeSpan.FromMinutes(1);
}
public async Task InvokeAsync(HttpContext context)
{
var key = GetKey(context); // e.g., $"{context.Request.Path}:{clientId}"
var now = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
var windowKey = $"{key}:{now / (long)_window.TotalSeconds}"; // fixed window bucket
// Atomic increment
var count = (int)await _redis.StringIncrementAsync(windowKey);
if (count == 1)
{
// set expiry for the bucket
await _redis.KeyExpireAsync(windowKey, _window);
}
if (count > _limit)
{
context.Response.StatusCode = StatusCodes.Status429TooManyRequests;
context.Response.Headers["Retry-After"] = _window.TotalSeconds.ToString();
await context.Response.WriteAsync("Rate limit exceeded");
return;
}
// Optionally set headers with remaining
context.Response.Headers["X-RateLimit-Limit"] = _limit.ToString();
context.Response.Headers["X-RateLimit-Remaining"] = Math.Max(0, _limit - count).ToString();
await _next(context);
}
private static string GetKey(HttpContext context)
{
var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
return $"rl:{ip}:{context.Request.Path}";
}
}
Register in Program.cs:
var mux = ConnectionMultiplexer.Connect("redis:6379");
builder.Services.AddSingleton<IConnectionMultiplexer>(mux);
app.UseMiddleware<RedisRateLimitMiddleware>();
3.2 Token bucket with Redis (Lua script)
For smoother limiting and atomicity, use a Lua script that stores token count and last refill timestamp and returns remaining tokens. This avoids race conditions. Implementation is longer — consider using an existing library if you want robust tokens with refill rates.
4. Endpoint and policy mapping
You can declare different policies per endpoint:
Public endpoints: strict limit (e.g., 10 req/min per IP).
Authenticated endpoints: soft quota (e.g., 1000 req/min per user).
Admin clients: exempt or higher limits.
Heavy endpoints (export): cost more tokens.
With built-in middleware
app.MapGet("/public", () => "public").RequireRateLimiting("PerIpPolicy");
app.MapGet("/user", () => "user").RequireRateLimiting("PerUserPolicy");
app.MapGet("/heavy", () => "heavy").RequireRateLimiting("ExpensivePolicy");
For ASP.NET Core controllers
[HttpGet]
[EnableRateLimiting("PerUserPolicy")]
public IActionResult Get() => Ok();
5. Informing clients — headers & Retry-After
Follow common header conventions so clients can react gracefully:
Retry-After — seconds to wait (for 429).
X-RateLimit-Limit — total allowed in window.
X-RateLimit-Remaining — remaining requests.
X-RateLimit-Reset — epoch/seconds when window resets.
Example header setting (middleware)
context.Response.Headers["X-RateLimit-Limit"] = limit.ToString();
context.Response.Headers["X-RateLimit-Remaining"] = (limit - used).ToString();
context.Response.Headers["X-RateLimit-Reset"] = resetTimeSeconds.ToString();
Be careful: set headers before writing response body, and avoid revealing internal distribution (optional).
6. Handling bursts vs steady rate (recommendations)
Use token bucket for smoothing bursts: allow a burst (token capacity) and a steady refill rate.
Use fixed window for simplicity, but accept boundary bursts.
Consider sliding window or leaky bucket if spikes at window boundary are a problem.
For paid tiers, use quotas (monthly requests) + rate limits (per minute) combined.
7. Security & fairness concerns
Identify clients reliably: API Key, client certificate, JWT sub claim — prefer stable identifiers over IP (users behind NAT share IP).
Avoid IP-only limits for mobile apps (carrier NAT can aggregate many users).
Allow admin/monitoring IP exceptions but register exemptions carefully.
Protect rate limit endpoints from enumeration (don’t leak specifics about other users’ quotas).
8. Observability & Metrics
Monitor and alert
Rate limit hits (429) per endpoint and per client
Average requests/sec and latency
Queue lengths if using queued requests (TokenBucket queue)
Errors correlated with rate limits (clients hammering)
Expose Prometheus metrics — counters for rate_limit_exceeded_total, rate_limit_allowed_total, rate_limit_requests_total.
Example: increment counters in middleware when reject/allow.
9. Testing & validating
Unit test middleware behavior (mock Redis or use in-memory).
Load test with k6, wrk2, or JMeter to validate desired throughput and behavior.
Test edge cases: concurrent requests, deployment scaling (multi-instance), clock shifts.
Test mobile and NAT-heavy networks to ensure fair user policies.
10. Advanced patterns & business features
Quota system: Keep monthly or daily counters (Redis/DB); combine with rate limiter for short-term limits.
Dynamic policy: Store policies in DB and reload them (IOptionsMonitor) so you can adjust without deployments.
Backoff headers: Add Retry-After or X-Backoff hints for graceful client retry.
Grace periods: For authenticated users, allow temporary burst overage with billing (metering).
Per-route costs: Give each endpoint a token cost (e.g., export=10 tokens), deduct tokens on each request.
11. Fallback & graceful degradation
When limit is reached, prefer returning consistent 429 responses with JSON body including retryAfter and cause:
{
"error": "rate_limit_exceeded",
"message": "You have exceeded your request quota.",
"retryAfter": 60
}
Where possible
Offer cached fallback results for read endpoints.
Queue non-urgent work (e.g., background export jobs) and notify user by email when ready.
12. Example: Combined approach (built-in + Redis for distributed)
A recommended production architecture:
Use built-in RateLimiter middleware for single-instance or local dev.
Implement a Redis-backed partitioner (custom PartitionedRateLimiter) to enforce limits across instances.
Keep policy definitions in a config store (DB/Redis) and load them on startup or via admin UI.
Emit metrics to Prometheus and setup alerts for spike in 429s.
Provide Retry-After and X-RateLimit-* headers.
13. Quick checklist before production rollout
Decide identifier for partition (API key/user id vs IP)
Choose algorithm (token bucket / fixed window / sliding)
If multi-instance, implement distributed counters (Redis or external gateway)
Add informative headers and standard 429 responses
Add metrics for allowed/rejected requests
Write load tests to validate limits and saturation behavior
Plan exemption and admin policies
Ensure rate limiter is in middleware pipeline before heavy processing (e.g., before body reading or DB calls) to save resources
Ensure MIME types and caching headers are separate from rate limit headers
14. Final notes & recommended libraries
Built-in: Microsoft.AspNetCore.RateLimiting (use this as first choice for modern ASP.NET Core).
Redis: use StackExchange.Redis + Lua scripts for atomic token bucket if you need smooth distributed tokens.
Third-party: AspNetCoreRateLimit (popular) — but review activity and support before adopting.
API gateways: For many microservices, enforce rate limits at gateway (NGINX, Envoy, Kong, Azure API Management) — simpler and centralized.
Gateway vs App
Gateways are ideal for simple per-IP and per-key limits (offloads app).
App-level rate limiting enables rich context-aware policies (user-level, business rules).