🧠 Let's think about Performance
When you think about software performance, its more than just one part of the code. If its a Web application, the performance starts with the lowest layer of your infrastructure that is hardware, its resources (memory, space, network), the Web server, database, and then the code. Here is an article I wrote on this topic: Top 10 Tips For Building High Performance Websites .
This article focuses on API performance and how to architect an API project.
![Speed up .NET API Speed]()
🎯 Step 1: Measure the right numbers before you tune
Before you start typing code, you must find the reason. You can start with running some analysis on your code. There are many tools available today including the Copilot you have in your IDE. If you do not use Copilot, I highly recommend spend few bucks and get a Copilot license. And the first thing you do is, ask Copilot to analyze your code.
Performance work fails when teams only look at average latency, because averages hide user pain and hide tail spikes.
Track these per endpoint and per dependency - Request rate, p50, p95, p99 latency, Error rate, CPU percent, Memory and GC time, Allocation rate, Thread pool starvation signals, Database duration and row counts, Outbound HTTP duration and error rate.
If you do not have distributed tracing, start there, because you will otherwise spend time optimizing the wrong layer. Logging alone is not observability, because it rarely shows timing structure across services.
🧱 Step 2: Fix the biggest problem first with a bottleneck map
In most real APIs, the time goes into one of these buckets:
Database queries and ORM overhead
Outbound HTTP calls
Serialization and payload size
Excessive allocations and GC pressure
Thread pool blocking caused by sync over async
Too much work in middleware or filters
Chatty endpoints that call many dependencies
A practical way to find the bottleneck is to pick a slow endpoint, capture a trace, and write down the percentage of time spent in each dependency. If 70 percent is the database, spending weeks micro optimizing JSON formatting is wasted effort.
⚙️ Step 3: Make the request pipeline smaller and cheaper
Prefer minimal middleware and avoid work for every request
Each middleware is a function call, and some middlewares do expensive work even when an endpoint does not need it. Make sure the expensive parts of your pipeline are conditional, and keep the common path thin.
Enable response compression when it helps
Compression reduces payload size and can reduce latency over slow networks, but it uses CPU, so it must be evaluated with real traffic. It tends to help more for large JSON responses and less for small payloads.
Avoid per request allocations in middleware
If your middleware creates strings, dictionaries, regex objects, or large objects per request, you are paying a GC tax at scale. Cache static data, use pooled structures when appropriate, and prefer pre compiled patterns for hot paths.
🧭 Step 4: Be intentional about routing and endpoint design
Avoid chatty endpoints
If the client needs five calls to render one view, your API is slow even if each call is “fast.” Consolidate into fewer endpoints that return exactly what the UI needs, but do not overfetch massive objects either, because big responses increase serialization time and memory.
Avoid overly generic endpoints on hot paths
Endpoints that take complex filter objects, dynamic sorts, and huge includes often generate complex queries and unpredictable plans. For hot paths, design explicit endpoints and explicit query shapes.
📦 Step 5: Reduce JSON cost and payload size
Serialization is not usually the top cost, but it becomes meaningful at high RPS or when payloads are large.
Practical tactics:
Return only the fields needed, avoid returning giant domain objects
Avoid cycles, avoid deep graphs, avoid unnecessary nested collections
Use pagination by default for collections
Avoid writing custom converters unless you have measured a real gain
If you control the client, consider whether you need JSON for everything. For internal service to service calls, gRPC can reduce payload size and CPU cost, but it introduces a different operational model, so treat it as an engineering decision, not a trend.
🗄️ Step 6: Caching, the only honest shortcut
Caching is the most effective performance tool when used correctly, because it eliminates work entirely.
Cache on the client and at the edge
If responses are cacheable, use proper cache headers and allow CDNs or gateways to serve responses without touching your API.
Use server side caching for expensive results
In memory caching is fast but limited to one instance, which is fine for single instance or when the data is truly per instance. Distributed caching is more suitable when you have multiple replicas and want consistent hit rates.
The key is invalidation and freshness. Cache what is either immutable, slowly changing, or acceptable to be slightly stale. Do not cache data where correctness depends on millisecond freshness unless you have a strong strategy.
🧠 Step 7: EF Core and database performance
Most API latency comes from the database layer, so improving query shape and reducing round trips usually beats any framework tweak.
Use AsNoTracking for read heavy queries
Tracking adds overhead and memory pressure. For read only queries, disable tracking.
var products = await db.Products
.AsNoTracking()
.Where(p => p.IsActive)
.Select(p => new ProductDto(p.Id, p.Name, p.Price))
.ToListAsync(ct);
Project to DTOs instead of loading full entities
Projection reduces materialization work and avoids bringing columns you do not need.
Avoid N plus 1 problems
If you see many similar queries per request, you are doing repeated lazy loading patterns. Prefer explicit includes when needed, but do not blindly include everything, because huge joins and cartesian explosions can be worse.
Use compiled queries for very hot paths
For endpoints hit extremely often, compiled queries can reduce overhead, but only use them after measurement, because maintainability matters.
Make sure the database has the right indexes
App code cannot out run missing indexes. If your hot query filters on two columns, you probably need an index designed for that access pattern, and you must validate with the database execution plan, not assumptions.
🌐 Step 8: Outbound HTTP and dependency calls
Outbound HTTP calls often dominate latency and create tail spikes.
Use IHttpClientFactory and reuse connections
Creating new HttpClient instances incorrectly can lead to socket exhaustion and poor performance.
Use timeouts and cancellation tokens everywhere
A slow dependency without timeouts becomes your performance problem.
using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
cts.CancelAfter(TimeSpan.FromSeconds(2));
var response = await httpClient.GetAsync(url, cts.Token);
response.EnsureSuccessStatusCode();
Use concurrency intentionally
If you need data from two independent services, fetch them concurrently, but do not do this blindly, because it increases load and can create dependency storms.
var aTask = serviceA.GetAsync(ct);
var bTask = serviceB.GetAsync(ct);
await Task.WhenAll(aTask, bTask);
return new CombinedDto(aTask.Result, bTask.Result);
🧵 Step 9: Async correctness and thread pool health
A surprising amount of ASP.NET Core “performance issues” are actually thread pool starvation caused by blocking calls.
Avoid these patterns on request threads
Task.Result or Task.Wait
Calling synchronous database APIs
Using lock heavily on hot paths
Using Thread.Sleep
Doing CPU heavy work inside request handlers without offloading and limiting
If you must do CPU heavy work, consider queueing it to a background worker and returning a job id, because CPU bound work in the request path reduces throughput and increases tail latency.
🧰 Step 10: Logging and exceptions on hot paths
Logging is necessary, but logging too much in hot endpoints can slow you down through formatting cost, allocation cost, and IO back pressure.
Practical guidance
Log structured data, avoid string interpolation on hot paths
Avoid logging large payloads
Use log levels correctly, and do not log information level for every request in high traffic APIs
Avoid using exceptions for normal control flow, because exceptions are expensive and also pollute telemetry
🔒 Step 11: Security features without performance tax surprises
Authentication and authorization can be expensive if misconfigured, especially if you validate tokens repeatedly with remote calls or you do heavy policy checks on every request.
Cache token validation metadata when possible, keep policy evaluation efficient, and avoid doing database lookups inside authorization handlers unless absolutely necessary, because that forces your security layer to become a latency multiplier.
🚀 Step 12: Hosting and deployment optimizations
Use the right runtime mode
Release builds, ready to run when useful, and trimming only when you have validated the impact. The wrong publish settings can increase startup time or break reflection heavy libraries.
Scale out before scaling up when possible
Cloud native APIs often get better throughput and stability by adding replicas rather than trying to push one instance to the limit, because tail latency and GC pauses become less catastrophic when load is spread.
Validate Kestrel and reverse proxy settings
Misconfigured proxies can add buffering, compression overhead, header limits, or timeouts that appear as API slowness. Ensure timeouts match your resilience strategy, because an API with a 30 second proxy timeout will still look broken if your downstream has a 2 second timeout and you do not handle it gracefully.
✅ A practical performance checklist you can run per endpoint
| Area | What to check | What good looks like |
|---|
| Query shape | Are you projecting DTOs, using AsNoTracking, avoiding N plus 1 | One query per need, minimal columns, predictable plans |
| Payload | Are you returning only what is needed, paging collections | Small responses, stable schema, no deep graphs |
| Caching | Can results be cached at edge or server | High cache hit rate for expensive reads |
| Outbound calls | Are timeouts, retries, and concurrency controlled | No long hangs, bounded retries, cancellation everywhere |
| Allocations | Are you allocating large objects per request | Low allocation rate, stable GC time |
| Async | Any sync blocking calls | Pure async flow in request path |
| Logging | Any heavy string building or noisy logs | Structured logs, minimal overhead |
| Limits | Request body limits, rate limiting, circuit breakers | System degrades gracefully under stress |
❓ Top 5 FAQs about ASP.NET Core API performance
1. Should I use minimal APIs for performance
Minimal APIs can reduce overhead and improve throughput for certain styles of endpoints, but the biggest wins usually come from database, caching, and dependency tuning, so minimal APIs are not a magic lever, they are a good fit when you want a lean pipeline and simple endpoints.
2. What is the fastest way to reduce latency
Caching the response or caching the expensive parts of the computation is usually the fastest win, because you remove work entirely, while query shape improvements are typically the next largest win because database time dominates most APIs.
3. Is EF Core slow
EF Core is not inherently slow, but it can become slow when you track everything, load full entities with huge graphs, generate poor queries, or miss indexes, so most EF performance work is really query design and database design.
4. When should I use gRPC instead of REST
When service to service calls are high volume, low latency, and under your control, gRPC can reduce payload and CPU cost, but it requires consistent client support and operational maturity, so it is usually a platform choice rather than an endpoint by endpoint choice.
5. How do I reduce p99 latency spikes
You reduce tail latency by bounding dependencies with timeouts, using circuit breakers, eliminating thread pool starvation, preventing long database queries, controlling concurrency, and keeping memory allocation stable so GC pauses do not cluster during traffic spikes.