ASP.NET Core  

How Do I Improve Performance in ASP.NET Core APIs?

🧠 Let's think about Performance

When you think about software performance, its more than just one part of the code. If its a Web application, the performance starts with the lowest layer of your infrastructure that is hardware, its resources (memory, space, network), the Web server, database, and then the code. Here is an article I wrote on this topic: Top 10 Tips For Building High Performance Websites .

This article focuses on API performance and how to architect an API project.

Speed up .NET API Speed

🎯 Step 1: Measure the right numbers before you tune

Before you start typing code, you must find the reason. You can start with running some analysis on your code. There are many tools available today including the Copilot you have in your IDE. If you do not use Copilot, I highly recommend spend few bucks and get a Copilot license. And the first thing you do is, ask Copilot to analyze your code.

Performance work fails when teams only look at average latency, because averages hide user pain and hide tail spikes.

Track these per endpoint and per dependency - Request rate, p50, p95, p99 latency, Error rate, CPU percent, Memory and GC time, Allocation rate, Thread pool starvation signals, Database duration and row counts, Outbound HTTP duration and error rate.

If you do not have distributed tracing, start there, because you will otherwise spend time optimizing the wrong layer. Logging alone is not observability, because it rarely shows timing structure across services.

🧱 Step 2: Fix the biggest problem first with a bottleneck map

In most real APIs, the time goes into one of these buckets:

  • Database queries and ORM overhead

  • Outbound HTTP calls

  • Serialization and payload size

  • Excessive allocations and GC pressure

  • Thread pool blocking caused by sync over async

  • Too much work in middleware or filters

  • Chatty endpoints that call many dependencies

A practical way to find the bottleneck is to pick a slow endpoint, capture a trace, and write down the percentage of time spent in each dependency. If 70 percent is the database, spending weeks micro optimizing JSON formatting is wasted effort.

⚙️ Step 3: Make the request pipeline smaller and cheaper

Prefer minimal middleware and avoid work for every request

Each middleware is a function call, and some middlewares do expensive work even when an endpoint does not need it. Make sure the expensive parts of your pipeline are conditional, and keep the common path thin.

Enable response compression when it helps

Compression reduces payload size and can reduce latency over slow networks, but it uses CPU, so it must be evaluated with real traffic. It tends to help more for large JSON responses and less for small payloads.

Avoid per request allocations in middleware

If your middleware creates strings, dictionaries, regex objects, or large objects per request, you are paying a GC tax at scale. Cache static data, use pooled structures when appropriate, and prefer pre compiled patterns for hot paths.

🧭 Step 4: Be intentional about routing and endpoint design

Avoid chatty endpoints

If the client needs five calls to render one view, your API is slow even if each call is “fast.” Consolidate into fewer endpoints that return exactly what the UI needs, but do not overfetch massive objects either, because big responses increase serialization time and memory.

Avoid overly generic endpoints on hot paths

Endpoints that take complex filter objects, dynamic sorts, and huge includes often generate complex queries and unpredictable plans. For hot paths, design explicit endpoints and explicit query shapes.

📦 Step 5: Reduce JSON cost and payload size

Serialization is not usually the top cost, but it becomes meaningful at high RPS or when payloads are large.

Practical tactics:

  • Return only the fields needed, avoid returning giant domain objects

  • Avoid cycles, avoid deep graphs, avoid unnecessary nested collections

  • Use pagination by default for collections

  • Avoid writing custom converters unless you have measured a real gain

If you control the client, consider whether you need JSON for everything. For internal service to service calls, gRPC can reduce payload size and CPU cost, but it introduces a different operational model, so treat it as an engineering decision, not a trend.

🗄️ Step 6: Caching, the only honest shortcut

Caching is the most effective performance tool when used correctly, because it eliminates work entirely.

Cache on the client and at the edge

If responses are cacheable, use proper cache headers and allow CDNs or gateways to serve responses without touching your API.

Use server side caching for expensive results

In memory caching is fast but limited to one instance, which is fine for single instance or when the data is truly per instance. Distributed caching is more suitable when you have multiple replicas and want consistent hit rates.

The key is invalidation and freshness. Cache what is either immutable, slowly changing, or acceptable to be slightly stale. Do not cache data where correctness depends on millisecond freshness unless you have a strong strategy.

🧠 Step 7: EF Core and database performance

Most API latency comes from the database layer, so improving query shape and reducing round trips usually beats any framework tweak.

Use AsNoTracking for read heavy queries

Tracking adds overhead and memory pressure. For read only queries, disable tracking.

var products = await db.Products
    .AsNoTracking()
    .Where(p => p.IsActive)
    .Select(p => new ProductDto(p.Id, p.Name, p.Price))
    .ToListAsync(ct);

Project to DTOs instead of loading full entities

Projection reduces materialization work and avoids bringing columns you do not need.

Avoid N plus 1 problems

If you see many similar queries per request, you are doing repeated lazy loading patterns. Prefer explicit includes when needed, but do not blindly include everything, because huge joins and cartesian explosions can be worse.

Use compiled queries for very hot paths

For endpoints hit extremely often, compiled queries can reduce overhead, but only use them after measurement, because maintainability matters.

Make sure the database has the right indexes

App code cannot out run missing indexes. If your hot query filters on two columns, you probably need an index designed for that access pattern, and you must validate with the database execution plan, not assumptions.

🌐 Step 8: Outbound HTTP and dependency calls

Outbound HTTP calls often dominate latency and create tail spikes.

Use IHttpClientFactory and reuse connections

Creating new HttpClient instances incorrectly can lead to socket exhaustion and poor performance.

Use timeouts and cancellation tokens everywhere

A slow dependency without timeouts becomes your performance problem.

using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
cts.CancelAfter(TimeSpan.FromSeconds(2));

var response = await httpClient.GetAsync(url, cts.Token);
response.EnsureSuccessStatusCode();

Use concurrency intentionally

If you need data from two independent services, fetch them concurrently, but do not do this blindly, because it increases load and can create dependency storms.

var aTask = serviceA.GetAsync(ct);
var bTask = serviceB.GetAsync(ct);

await Task.WhenAll(aTask, bTask);

return new CombinedDto(aTask.Result, bTask.Result);

🧵 Step 9: Async correctness and thread pool health

A surprising amount of ASP.NET Core “performance issues” are actually thread pool starvation caused by blocking calls.

Avoid these patterns on request threads

  • Task.Result or Task.Wait

  • Calling synchronous database APIs

  • Using lock heavily on hot paths

  • Using Thread.Sleep

  • Doing CPU heavy work inside request handlers without offloading and limiting

If you must do CPU heavy work, consider queueing it to a background worker and returning a job id, because CPU bound work in the request path reduces throughput and increases tail latency.

🧰 Step 10: Logging and exceptions on hot paths

Logging is necessary, but logging too much in hot endpoints can slow you down through formatting cost, allocation cost, and IO back pressure.

Practical guidance

  • Log structured data, avoid string interpolation on hot paths

  • Avoid logging large payloads

  • Use log levels correctly, and do not log information level for every request in high traffic APIs

  • Avoid using exceptions for normal control flow, because exceptions are expensive and also pollute telemetry

🔒 Step 11: Security features without performance tax surprises

Authentication and authorization can be expensive if misconfigured, especially if you validate tokens repeatedly with remote calls or you do heavy policy checks on every request.

Cache token validation metadata when possible, keep policy evaluation efficient, and avoid doing database lookups inside authorization handlers unless absolutely necessary, because that forces your security layer to become a latency multiplier.

🚀 Step 12: Hosting and deployment optimizations

Use the right runtime mode

Release builds, ready to run when useful, and trimming only when you have validated the impact. The wrong publish settings can increase startup time or break reflection heavy libraries.

Scale out before scaling up when possible

Cloud native APIs often get better throughput and stability by adding replicas rather than trying to push one instance to the limit, because tail latency and GC pauses become less catastrophic when load is spread.

Validate Kestrel and reverse proxy settings

Misconfigured proxies can add buffering, compression overhead, header limits, or timeouts that appear as API slowness. Ensure timeouts match your resilience strategy, because an API with a 30 second proxy timeout will still look broken if your downstream has a 2 second timeout and you do not handle it gracefully.

✅ A practical performance checklist you can run per endpoint

AreaWhat to checkWhat good looks like
Query shapeAre you projecting DTOs, using AsNoTracking, avoiding N plus 1One query per need, minimal columns, predictable plans
PayloadAre you returning only what is needed, paging collectionsSmall responses, stable schema, no deep graphs
CachingCan results be cached at edge or serverHigh cache hit rate for expensive reads
Outbound callsAre timeouts, retries, and concurrency controlledNo long hangs, bounded retries, cancellation everywhere
AllocationsAre you allocating large objects per requestLow allocation rate, stable GC time
AsyncAny sync blocking callsPure async flow in request path
LoggingAny heavy string building or noisy logsStructured logs, minimal overhead
LimitsRequest body limits, rate limiting, circuit breakersSystem degrades gracefully under stress

❓ Top 5 FAQs about ASP.NET Core API performance

1. Should I use minimal APIs for performance

Minimal APIs can reduce overhead and improve throughput for certain styles of endpoints, but the biggest wins usually come from database, caching, and dependency tuning, so minimal APIs are not a magic lever, they are a good fit when you want a lean pipeline and simple endpoints.

2. What is the fastest way to reduce latency

Caching the response or caching the expensive parts of the computation is usually the fastest win, because you remove work entirely, while query shape improvements are typically the next largest win because database time dominates most APIs.

3. Is EF Core slow

EF Core is not inherently slow, but it can become slow when you track everything, load full entities with huge graphs, generate poor queries, or miss indexes, so most EF performance work is really query design and database design.

4. When should I use gRPC instead of REST

When service to service calls are high volume, low latency, and under your control, gRPC can reduce payload and CPU cost, but it requires consistent client support and operational maturity, so it is usually a platform choice rather than an endpoint by endpoint choice.

5. How do I reduce p99 latency spikes

You reduce tail latency by bounding dependencies with timeouts, using circuit breakers, eliminating thread pool starvation, preventing long database queries, controlling concurrency, and keeping memory allocation stable so GC pauses do not cluster during traffic spikes.