ASP.NET Core  

How to Implement Rate Limiting and API Throttling in ASP.NET Core

Rate limiting (a.k.a API throttling) protects your APIs from abuse, prevents overload, and enforces fair usage. In ASP.NET Core you can implement it at many levels: per-IP, per-user, per-endpoint, and globally — using built-in middleware, third-party libraries, or a distributed store (Redis) for multi-instance scenarios.

This guide explains practical strategies, shows ready-to-drop-in code (modern ASP.NET Core), and covers production concerns: headers, retry hints, metrics, and pattern recommendations.

1. Goals & common policies

Before implementation, decide what you want to protect against. Typical goals:

  • Prevent too many requests from a single client (DoS protection).

  • Enforce fair API quotas for paid/free tiers (business rules).

  • Smooth bursts while allowing sustained throughput (token bucket).

  • Rate limit across multiple app instances (distributed counters).

Common rate limit strategies

  • Fixed window: count requests per fixed interval (easy, bursty).

  • Sliding window: smoother than fixed window (slightly more complex).

  • Leaky bucket / Token bucket: allow bursts up to capacity, then steady rate.

  • Concurrency limit: limit concurrent active operations (server load control).

  • Request cost: weight endpoints by cost (heavy endpoints use more tokens).

2. Built-in Rate Limiting (ASP.NET Core 7+)

.NET provides Microsoft.AspNetCore.RateLimiting (and System.Threading.RateLimiting) with RateLimiter, TokenBucketRateLimiter, and PartitionedRateLimiter. It integrates as middleware and is the recommended starting point for single-instance apps or when using a distributed custom partitioner.

2.1 Simple Token Bucket per IP (example)

// Program.cs (minimal API / .NET 7+)
using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;

var builder = WebApplication.CreateBuilder(args);

// Register rate limiting
builder.Services.AddRateLimiter(options =>
{
    // Partition by remote IP
    options.AddPolicy("PerIpPolicy", context =>
        RateLimitPartition.GetRemoteIpAddressPolicy(context, ip =>
            new TokenBucketRateLimiterOptions
            {
                // allow 10 tokens initially, refill 1 token per second, capacity 10
                TokenLimit = 10,
                QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
                QueueLimit = 0,
                ReplenishmentPeriod = TimeSpan.FromSeconds(1),
                TokensPerPeriod = 1
            }));
});

var app = builder.Build();

// Use middleware
app.UseRateLimiter();

// Example endpoint with policy
app.MapGet("/api/data", () => Results.Ok("OK"))
   .RequireRateLimiting("PerIpPolicy");

app.Run();

// Helper extension
public static class RateLimitPartition
{
    public static Func<HttpContext, string> GetRemoteIpAddressPolicy = context =>
    {
        var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
        return ip;
    };

    public static PartitionedRateLimiter<HttpContext, string> GetRemoteIpAddressPolicy(
        HttpContext context, Func<string, TokenBucketRateLimiterOptions> optionsFactory) =>
            PartitionedRateLimiter.Create<HttpContext, string>(context =>
                RateLimitPartition.GetRemoteIpAddressPolicy(context) is string key
                    ? RateLimitPartition.GetPartitionForKey(key, optionsFactory(key))
                    : RateLimitPartition.GetPartitionForKey("unknown", optionsFactory("unknown")));

    // ... implement GetPartitionForKey if needed or use built-in helper above in AddPolicy
}

Notes

  • This built-in approach uses in-memory token buckets (per app instance).

  • TokenLimit is burst capacity; TokensPerPeriod controls steady rate.

2.2 Per-User or Per-API Key Partitioning

If your API authenticates clients, partition by user id or API key instead of IP:

options.AddPolicy("PerUserPolicy", context =>
{
    var userId = context.User?.FindFirst("sub")?.Value ?? "anon";
    return RateLimitPartition.GetFixedWindowLimiter(userId, _ => new FixedWindowRateLimiterOptions
    {
        PermitLimit = 100,
        Window = TimeSpan.FromMinutes(1),
        QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
        QueueLimit = 0
    });
});

3. Distributed Rate Limiting (multi-instance) — Redis approach

In multi-instance scenarios (Kubernetes, multiple app servers), in-memory counters don't work. Use a distributed store such as Redis.

Two practical approaches:

  1. Use a library that implements distributed rate limiting (e.g., [AspNetCoreRateLimit] is popular but mainly in-memory; other libraries exist).

  2. Implement a simple Redis-backed fixed-window or token-bucket using atomic Redis commands (INCR + EXPIRE or Lua script) and custom middleware.

3.1 Redis fixed-window example (middleware)

Pros: simple, works well for quotas.
Cons: fixed-window allows all burst at boundary; sliding window or leaky bucket with Lua is better for smoothing.

// RedisRateLimitMiddleware.cs
using StackExchange.Redis;
using Microsoft.AspNetCore.Http;
using System.Text.Json;

public class RedisRateLimitMiddleware
{
    private readonly RequestDelegate _next;
    private readonly IDatabase _redis;
    private readonly int _limit;
    private readonly TimeSpan _window;

    public RedisRateLimitMiddleware(RequestDelegate next, IConnectionMultiplexer multiplexer, int limit = 100, TimeSpan? window = null)
    {
        _next = next;
        _redis = multiplexer.GetDatabase();
        _limit = limit;
        _window = window ?? TimeSpan.FromMinutes(1);
    }

    public async Task InvokeAsync(HttpContext context)
    {
        var key = GetKey(context); // e.g., $"{context.Request.Path}:{clientId}"
        var now = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
        var windowKey = $"{key}:{now / (long)_window.TotalSeconds}"; // fixed window bucket

        // Atomic increment
        var count = (int)await _redis.StringIncrementAsync(windowKey);
        if (count == 1)
        {
            // set expiry for the bucket
            await _redis.KeyExpireAsync(windowKey, _window);
        }

        if (count > _limit)
        {
            context.Response.StatusCode = StatusCodes.Status429TooManyRequests;
            context.Response.Headers["Retry-After"] = _window.TotalSeconds.ToString();
            await context.Response.WriteAsync("Rate limit exceeded");
            return;
        }

        // Optionally set headers with remaining
        context.Response.Headers["X-RateLimit-Limit"] = _limit.ToString();
        context.Response.Headers["X-RateLimit-Remaining"] = Math.Max(0, _limit - count).ToString();
        await _next(context);
    }

    private static string GetKey(HttpContext context)
    {
        var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
        return $"rl:{ip}:{context.Request.Path}";
    }
}

Register in Program.cs:

var mux = ConnectionMultiplexer.Connect("redis:6379");
builder.Services.AddSingleton<IConnectionMultiplexer>(mux);
app.UseMiddleware<RedisRateLimitMiddleware>();

3.2 Token bucket with Redis (Lua script)

For smoother limiting and atomicity, use a Lua script that stores token count and last refill timestamp and returns remaining tokens. This avoids race conditions. Implementation is longer — consider using an existing library if you want robust tokens with refill rates.

4. Endpoint and policy mapping

You can declare different policies per endpoint:

  • Public endpoints: strict limit (e.g., 10 req/min per IP).

  • Authenticated endpoints: soft quota (e.g., 1000 req/min per user).

  • Admin clients: exempt or higher limits.

  • Heavy endpoints (export): cost more tokens.

With built-in middleware

app.MapGet("/public", () => "public").RequireRateLimiting("PerIpPolicy");
app.MapGet("/user", () => "user").RequireRateLimiting("PerUserPolicy");
app.MapGet("/heavy", () => "heavy").RequireRateLimiting("ExpensivePolicy");

For ASP.NET Core controllers

[HttpGet]
[EnableRateLimiting("PerUserPolicy")]
public IActionResult Get() => Ok();

5. Informing clients — headers & Retry-After

Follow common header conventions so clients can react gracefully:

  • Retry-After — seconds to wait (for 429).

  • X-RateLimit-Limit — total allowed in window.

  • X-RateLimit-Remaining — remaining requests.

  • X-RateLimit-Reset — epoch/seconds when window resets.

Example header setting (middleware)

context.Response.Headers["X-RateLimit-Limit"] = limit.ToString();
context.Response.Headers["X-RateLimit-Remaining"] = (limit - used).ToString();
context.Response.Headers["X-RateLimit-Reset"] = resetTimeSeconds.ToString();

Be careful: set headers before writing response body, and avoid revealing internal distribution (optional).

6. Handling bursts vs steady rate (recommendations)

  • Use token bucket for smoothing bursts: allow a burst (token capacity) and a steady refill rate.

  • Use fixed window for simplicity, but accept boundary bursts.

  • Consider sliding window or leaky bucket if spikes at window boundary are a problem.

  • For paid tiers, use quotas (monthly requests) + rate limits (per minute) combined.

7. Security & fairness concerns

  • Identify clients reliably: API Key, client certificate, JWT sub claim — prefer stable identifiers over IP (users behind NAT share IP).

  • Avoid IP-only limits for mobile apps (carrier NAT can aggregate many users).

  • Allow admin/monitoring IP exceptions but register exemptions carefully.

  • Protect rate limit endpoints from enumeration (don’t leak specifics about other users’ quotas).

8. Observability & Metrics

Monitor and alert

  • Rate limit hits (429) per endpoint and per client

  • Average requests/sec and latency

  • Queue lengths if using queued requests (TokenBucket queue)

  • Errors correlated with rate limits (clients hammering)

Expose Prometheus metrics — counters for rate_limit_exceeded_total, rate_limit_allowed_total, rate_limit_requests_total.

Example: increment counters in middleware when reject/allow.

9. Testing & validating

  • Unit test middleware behavior (mock Redis or use in-memory).

  • Load test with k6, wrk2, or JMeter to validate desired throughput and behavior.

  • Test edge cases: concurrent requests, deployment scaling (multi-instance), clock shifts.

  • Test mobile and NAT-heavy networks to ensure fair user policies.

10. Advanced patterns & business features

  • Quota system: Keep monthly or daily counters (Redis/DB); combine with rate limiter for short-term limits.

  • Dynamic policy: Store policies in DB and reload them (IOptionsMonitor) so you can adjust without deployments.

  • Backoff headers: Add Retry-After or X-Backoff hints for graceful client retry.

  • Grace periods: For authenticated users, allow temporary burst overage with billing (metering).

  • Per-route costs: Give each endpoint a token cost (e.g., export=10 tokens), deduct tokens on each request.

11. Fallback & graceful degradation

When limit is reached, prefer returning consistent 429 responses with JSON body including retryAfter and cause:

{
  "error": "rate_limit_exceeded",
  "message": "You have exceeded your request quota.",
  "retryAfter": 60
}

Where possible

  • Offer cached fallback results for read endpoints.

  • Queue non-urgent work (e.g., background export jobs) and notify user by email when ready.

12. Example: Combined approach (built-in + Redis for distributed)

A recommended production architecture:

  • Use built-in RateLimiter middleware for single-instance or local dev.

  • Implement a Redis-backed partitioner (custom PartitionedRateLimiter) to enforce limits across instances.

  • Keep policy definitions in a config store (DB/Redis) and load them on startup or via admin UI.

  • Emit metrics to Prometheus and setup alerts for spike in 429s.

  • Provide Retry-After and X-RateLimit-* headers.

13. Quick checklist before production rollout

  • Decide identifier for partition (API key/user id vs IP)

  • Choose algorithm (token bucket / fixed window / sliding)

  • If multi-instance, implement distributed counters (Redis or external gateway)

  • Add informative headers and standard 429 responses

  • Add metrics for allowed/rejected requests

  • Write load tests to validate limits and saturation behavior

  • Plan exemption and admin policies

  • Ensure rate limiter is in middleware pipeline before heavy processing (e.g., before body reading or DB calls) to save resources

  • Ensure MIME types and caching headers are separate from rate limit headers

14. Final notes & recommended libraries

  • Built-in: Microsoft.AspNetCore.RateLimiting (use this as first choice for modern ASP.NET Core).

  • Redis: use StackExchange.Redis + Lua scripts for atomic token bucket if you need smooth distributed tokens.

  • Third-party: AspNetCoreRateLimit (popular) — but review activity and support before adopting.

  • API gateways: For many microservices, enforce rate limits at gateway (NGINX, Envoy, Kong, Azure API Management) — simpler and centralized.

Gateway vs App

  • Gateways are ideal for simple per-IP and per-key limits (offloads app).

  • App-level rate limiting enables rich context-aware policies (user-level, business rules).