Introduction
When systems grow beyond a single process, debugging and observability become hard. A single user action may touch multiple services, background workers, databases and third-party APIs. Without a unified correlation model, finding the root cause of a failure becomes a scatter-gun exercise: you search logs in Service A, then Service B, then the queue, hoping timestamps line up.
A Unified Logging Correlation Model solves this by propagating a shared correlation context — typically a Trace ID — across every hop of a request. When logs, traces and metrics all include that same Trace ID, you can reconstruct the entire transaction across your distributed system.
This article gives a production-ready blueprint: design principles, propagation techniques (HTTP, messaging, DB calls), OpenTelemetry integration, sampling, performance trade-offs, security and concrete .NET and Angular code examples. Diagrams use block-style layout for clarity.
Goals and Non-Goals
Goals
Generate a stable Trace ID for each external request or scheduled job.
Propagate the Trace ID through all service boundaries: HTTP, gRPC, message buses, background workers.
Attach the Trace ID to logs, traces, and metrics.
Support distributed tracing (spans) with OpenTelemetry.
Preserve minimal overhead and ensure safe sampling.
Allow ad-hoc correlation from logs and dashboards.
Non-Goals
Replacing business-level correlation IDs (orderId, invoiceId). Use both: Trace ID for technical flow, business ID for domain tracing.
Forcing 100% tracing of every low-value event—use sampling and tail-based strategies.
Core Concepts
Trace ID: global identifier for the whole transaction (128-bit recommended).
Span: a unit of work inside the trace (HTTP handler, DB call). Spans have start/end timestamps and parent-child relationships.
Baggage: small key-value pairs that travel with the trace across services (use sparingly).
Correlation Context: a structured map containing Trace ID, Span ID, user id, tenant id, and optional business IDs.
Propagator: code that injects / extracts trace and baggage into transport headers.
High-Level Flow (Block Diagram)
┌──────────┐ HTTP Req ┌────────────┐ MQ Msg ┌────────────┐
│ Browser │ ─────────────> │ API Gateway│ ───────────> │ Worker │
│ (Angular)│ │ (.NET) │ │ (.NET) │
└────┬─────┘ └────┬───────┘ └────┬───────┘
│ │ │
│ TraceID injected │ TraceID forwarded │ TraceID extracted
│ (client) │ (HTTP headers) │ (message attributes)
▼ ▼ ▼
Logs include traceId Logs include traceId Logs include traceId
Metrics tagged Traces & spans created Spans continue, baggage preserved
Trace ID Format and Generation
Use 128-bit Trace IDs (hex string) for uniqueness and compatibility with tracing backends (Jaeger, Zipkin, OTLP). Example: 4bf92f3577b34da6a3ce929d0e0e4736.
Generate Trace ID in the first ingress point: API Gateway, Load Balancer, or front-end when initiating long flows. If an incoming request already carries a Trace ID, honor it (preserve continuity).
Trace ID must be URL-safe (hex) and reasonably short to avoid header bloat.
Prefer secure, collision-resistant RNG (e.g., Guid with v4 or cryptographically random 16 bytes).
Header Names and Standards
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: vendor1=opaque,other=val
Propagation Patterns
HTTP (Inbound / Outbound)
Extract: At the beginning of an HTTP request, extract traceparent and baggage headers. If missing, create a new Trace ID and root span.
Inject: When making outbound HTTP calls, inject the current traceparent and tracestate into outgoing request headers.
Messaging (Kafka / RabbitMQ / SQS)
Inject as message attributes/headers: place traceparent and baggage as headers on the message.
Consume: Consumer extracts trace headers, continues the trace by creating a child span, and processes. Always include them in any logs from the consumer.
Background Jobs and Batch Processing
Database Calls and Non-networked Resources
OpenTelemetry Integration (Recommended)
OpenTelemetry gives a vendor-neutral SDK for traces, metrics, and logs. Core steps:
Instrument your services with OpenTelemetry SDK.
Use W3C Trace Context propagator (default in OT).
Export traces via OTLP to collectors (Jaeger, Zipkin, Tempo, DataDog, New Relic).
Correlate logs by including trace_id and span_id in structured log records.
Example .NET setup
// Program.cs
builder.Services.AddOpenTelemetryTracing(tracing =>
{
tracing
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddSqlClientInstrumentation()
.SetSampler(new ParentBasedSampler(new TraceIdRatioBasedSampler(0.1))) // example
.AddOtlpExporter(opts =>
{
opts.Endpoint = new Uri("http://otel-collector:4317");
});
});
Angular front-end can inject Trace IDs for long-running flows (optional). Use OpenTelemetry JS for browser instrumentation.
Correlating Logs, Traces and Metrics
Structured logging: Include trace_id, span_id, user_id, tenant_id, and business_id (if available) as structured fields in every log. Example JSON log:
{
"timestamp":"2025-11-21T10:00:00Z",
"level":"Error",
"message":"Payment failed",
"trace_id":"4bf92f3577b34da6a3ce929d0e0e4736",
"span_id":"00f067aa0ba902b7",
"user_id":"u-123",
"order_id":"ORD-987",
"exception":"TimeoutException"
}
.NET Serilog example:
Log.Logger = new LoggerConfiguration()
.Enrich.FromLogContext()
.Enrich.WithProperty("service", "orders-api")
.WriteTo.Console()
.CreateLogger();
app.Use(async (ctx, next) =>
{
using var activity = MyTracing.StartIncomingActivity(ctx);
LogContext.PushProperty("trace_id", activity.TraceId.ToHexString());
await next();
});
Sampling Strategy
Tracing every single request at 100% is expensive. Use sampling:
Head-based sampling: Decide at trace start whether to collect spans (e.g., 1% of requests). Use higher rates for error paths or admin users.
Tail sampling: Collect a lightweight sample of traces and later decide to keep traces with errors or significant patterns (requires buffering and collector support).
Adaptive sampling: Increase sampling when error rates rise.
Always sample errors: If a span logs an error, ensure related trace is kept.
Design sampling so that at least one trace per interesting failure is captured.
Baggage: What and What Not to Carry
Baggage travels with traces but increases header size and risk:
Good candidates: tenant id, environment, request source (mobile/web), debug-flags (temporary).
Bad candidates: large strings, PII, secrets (never carry passwords or tokens in baggage).
Keep baggage small (a few keys) and prefer storing large contextual data in a central store referenced by id.
Security And Privacy Considerations
No secrets in headers or baggage. Never propagate tokens, passwords or personal data through trace headers.
Mask PII in logs and traces if required by compliance. Use redaction rules in your logging pipeline.
Access controls: restrict who can view traces (PII and business-sensitive info may appear). Integrate with your IAM.
Retention: set retention windows for traces and logs based on compliance (e.g., 30–90 days). Store long-term aggregates only.
Performance Considerations
Minimal per-request overhead: generating trace IDs, injecting headers and creating small spans is cheap. However, exporting every span synchronously is costly—use asynchronous batch exporters.
Use sampling to reduce collector load.
Limit baggage size to reduce header transport costs.
Avoid high-cardinality tags (e.g., unique request ids) in long-term metric stores; use them in logs/traces only.
Instrumentation guardrails: sample only specific endpoints or only large requests.
.NET Implementation Examples
Middleware: Extract or Create Trace
public class TraceMiddleware
{
private readonly RequestDelegate _next;
public TraceMiddleware(RequestDelegate next) => _next = next;
public async Task Invoke(HttpContext context)
{
// Try extract W3C traceparent
var traceParent = context.Request.Headers["traceparent"].FirstOrDefault();
Activity activity;
if (!string.IsNullOrEmpty(traceParent))
{
// W3C extraction is handled by Activity if configured
activity = new Activity("incoming-request");
ActivityContext ctx = ActivityContext.Parse(traceParent, null);
activity.SetParentId(ctx.TraceId, ctx.SpanId, ctx.TraceFlags);
}
else
{
activity = new Activity("incoming-request");
}
activity.Start();
// Enrich logging context
LogContext.PushProperty("trace_id", activity.TraceId.ToHexString());
context.Items["trace"] = activity;
try
{
await _next(context);
}
finally
{
activity.Stop();
}
}
}
Note: Modern .NET (System.Diagnostics) has built-in W3C support when using ActivitySource and OpenTelemetry instrumentation. Prefer using OT SDK instead of manual Activity handling.
Outbound HTTP Client Injection
If you use HttpClientFactory, add a delegating handler to inject traceparent:
public class TracePropagationHandler : DelegatingHandler
{
protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
var activity = Activity.Current;
if (activity != null)
{
request.Headers.TryAddWithoutValidation("traceparent", activity.Id); // using Activity.Id when W3C off
// Better: use W3C propagate via ActivityContext or OpenTelemetry injectors
}
return base.SendAsync(request, cancellationToken);
}
}
Angular (Browser) Integration
Instrument key client-side actions:
Generate a client-side correlation id for UI flows (optional).
For single-page apps, use OpenTelemetry JS to create spans for page load, XHR, and navigation.
When calling your APIs, include a traceparent header if available (OTJS helps). Example:
// http-interceptor.ts
intercept(req: HttpRequest<any>, next: HttpHandler) {
const traceParent = this.traceService.getTraceParent();
let headers = req.headers;
if (traceParent) headers = headers.set('traceparent', traceParent);
return next.handle(req.clone({ headers }));
}
Common practice: servers generate canonical trace IDs; browser can attach X-Client-Trace to help group UI interactions, but trusting client-generated trace ids requires validation.
Messaging Example (Kafka + .NET)
Producer side:
var message = new Message<string, string> { Key = key, Value = payload };
message.Headers = new Headers();
var activity = Activity.Current;
if (activity != null)
{
message.Headers.Add("traceparent", Encoding.UTF8.GetBytes(activity.Id));
message.Headers.Add("trace_id", Encoding.UTF8.GetBytes(activity.TraceId.ToHexString()));
}
await producer.ProduceAsync(topic, message);
Consumer side:
var msg = consumer.Consume();
var traceParentHeader = msg.Message.Headers.GetLastBytes("traceparent");
if (traceParentHeader != null)
{
var contextStr = Encoding.UTF8.GetString(traceParentHeader);
// Extract and continue Activity; prefer OpenTelemetry Propagators
}
Prefer using OT SDK's propagation utilities to avoid manual header parsing.
Linking Business and Technical Correlation
Always include business IDs (order id, invoice id) as structured fields in logs and span attributes. This allows two workflows:
Together they allow end-to-end debug: find trace for the user action, then pivot to all traces referencing order id.
Observability Stack and Tooling
Collect and store: OpenTelemetry Collector → Jaeger, Tempo, Zipkin, or commercial APM.
Logs: Structured logs into Elastic, Splunk, or Datadog; include trace_id.
Metrics: Prometheus/Grafana for KPIs; include aggregated per-service metrics, not trace IDs.
UI: Jaeger/Tempo for trace visualization; Kibana or Datadog for logs linked by trace_id.
Make sure your collector supports tail-based sampling if you need to keep traces for errors only.
Testing, Validation and Rollout
Unit tests: Verify propagators inject/extract round-trip.
Integration tests: Simulate multi-service flows and check logs contain the same trace_id.
Chaos tests: Kill services mid-trace and verify downstream logs still have trace ids.
Staged rollout: Enable tracing at low sample rate on production, verify collector stability, then increase sampling.
Operational Practices
Trace retention policy: keep full traces for a short window (e.g., 7–30 days). Keep aggregates for longer.
Alerting: set alerts on tracing pipeline failures, collector backlog, or sudden drops in traces.
Access control: allow only authorized roles to view full trace data.
Anonymization: redact PII from traces before storing or restrict access.
Common Pitfalls And Remedies
Missing propagation on async boundaries: Use instrumented libraries or explicitly pass Activity.Current into tasks and thread pools.
High-cardinality tags in metrics: avoid attaching trace_id to long-term metrics; use logs/traces for high-cardinality queries.
Header trust: do not blindly trust trace headers from untrusted clients; validate format and optionally ignore client-supplied trace IDs for critical flows.
Excessive baggage: limit baggage size; use references to persisted context when necessary.
Example: End-to-End Scenario
User clicks “Pay” in Angular:
Angular interceptor attaches traceparent propagated by OpenTelemetry JS.
API Gateway extracts trace and starts root span.
Orders API creates a child span, logs orderId, userId.
Orders API publishes payment-request to Kafka with traceparent in headers.
Payment Worker consumes message, continues span, calls gateway to payment provider with traceparent.
Payment provider returns; worker logs result and updates DB with an audit record containing trace_id.
All logs, traces and the DB audit row include trace_id so a single query shows the full path.
Conclusion
A unified logging correlation model is essential for reliable debugging, capacity planning and incident response in distributed systems. The best practice is to:
Adopt W3C Trace Context for interoperability.
Instrument using OpenTelemetry for a vendor-neutral approach.
Propagate trace and minimal baggage across HTTP, messaging and background jobs.
Correlate logs, traces and metrics by including trace_id in structured logs.
Use sampling and privacy guardrails to keep the system efficient and compliant.