Event Replay System (Reprocess Events Without Affecting Live Users)

Rajesh Gami
4h
65
0
0

Article

Modern distributed systems rely heavily on event-driven architectures. These systems generate ordered streams of events that represent changes in the system. While live processing handles real-time workloads, many enterprise scenarios require running past events again. This process is called event replay.

An Event Replay System enables platforms to reprocess historical events without disrupting live users or corrupting state. This capability is critical when fixing corrupted projections, migrating systems, rebuilding read-models, or onboarding new services that must reprocess historical streams.

This article explains how to design a robust Event Replay System with a .NET backend and Angular frontend for orchestration.

Why Event Replay Matters

Event replay allows systems to:

Rebuild derived data stores (read models, analytics indices, materialized views).
Recover from projection failures or bugs.
Train ML models using historical data.
Onboard new bounded contexts or microservices.
Validate version upgrades with existing transaction history.

It essentially treats history as a first-class component of system reliability.

Key Requirements

A production-ready replay system must:

Replay events from a specific stream, partition, or global timeline.
Support filters (date range, event type, version).
Run replay in isolation from live traffic.
Track progress and allow pause/resume.
Guarantee idempotency.
Preserve event order.

Architectural Overview

A typical replay architecture includes:

Event Store: Kafka, EventStoreDB, CosmosDB Change Feed, SQL Event Table.
Replay Processor: Dedicated service processing selected events.
Replay Controller (API): Starts, stops, pauses, configures replay.
UI Dashboard (Angular): Allows administrators to schedule and monitor replays.
Projection/Read Model System: Receives replayed events.

Workflow Diagram

Admin Triggers Replay
       |
       v
Replay Manager API
       |
       v
Load Events from Event Store
       |
       v
Send to Replay Processor
       |
       v
Apply Rules (idempotent transformation, version upgrade if needed)
       |
       v
Write to Projections / Consumer Services
       |
       v
Track Status & Metrics

Flowchart

START
  |
  v
Select event stream or filter
  |
  v
Replay already running?
  |--YES--> Reject request
  |
  NO
  |
  v
Load next event in chronological order
  |
  v
Apply event handler
  |
  v
Success?
  |--NO--> Retry or mark failure and continue
  |
  YES
  |
  v
Progress > 100%?
  |--YES--> Mark replay complete
  |
  NO
  |
  v
Continue stream processing

Event Storage Model (SQL Example)

CREATE TABLE EventStream (
    Id BIGINT IDENTITY PRIMARY KEY,
    StreamId NVARCHAR(200),
    EventType NVARCHAR(200),
    EventBody NVARCHAR(MAX),
    Version INT,
    CreatedDate DATETIME2,
    IsProcessed BIT DEFAULT 0
);

Replay Metadata Table

CREATE TABLE ReplaySessions (
    ReplayId UNIQUEIDENTIFIER PRIMARY KEY,
    StreamId NVARCHAR(200),
    Status NVARCHAR(50),
    Progress DECIMAL(5,2),
    StartedAt DATETIME2,
    CompletedAt DATETIME2 NULL,
    FilterJson NVARCHAR(MAX)
);

Replay Service in .NET

public async Task ReplayAsync(Guid replayId)
{
    var session = await _registry.GetSession(replayId);

    var events = await _eventStore.LoadEvents(
        session.StreamId, 
        session.Filter);

    foreach (var evt in events)
    {
        try
        {
            await _handler.Process(evt, isReplay: true);
        }
        catch(Exception ex)
        {
            await _logger.LogReplayFailure(replayId, evt, ex);
        }

        await _registry.UpdateProgress(replayId);
    }

    await _registry.Complete(replayId);
}

Angular Admin UI Features

The UI allows:

Selecting stream or event type.
Applying date range filters.
Viewing progress bar.
Pausing, Resuming, Canceling replay.
Viewing failure logs.

Example Angular service call

startReplay(stream: string, filters: any): Observable<any> {
  return this.http.post(`/api/replay/start`, { stream, filters });
}

Ensuring Idempotency

Idempotency prevents double-processing. You can achieve this using:

Deduplication tables.
Hash-based projection checksums.
Event versioning.
Projection state markers.

Example

if(_projectionState.LastProcessedEventId >= event.Id)
    return;

Performance Strategies

Batch replay instead of event-by-event execution.
Parallel stream partitions where ordering does not matter.
Use bulk writes for analytics systems like Elasticsearch.
Use checkpointing and resume support.

Testing Strategy

Test Type	Purpose
Dry-run replay	Validate results without writing
Partial replay	Test new feature or migration
Stress replay	Verify performance under load
Versioned replay	Validate schema migration or projection upgrade

Failure Handling

All failures must be logged with correlation ID.
Replay must continue unless configured otherwise.
Failed events must be retriable independently.

Real-World Best Practices

Keep replay isolated from live system using separate consumers or namespaces.
Avoid replaying into production without validation.
Make replay results verifiable through queries and domain rules.
Archive completed replays for audit.

Conclusion

An Event Replay System transforms an event store from a passive log into an active resilience layer. It enables recovery, reprocessing, system migration, and analytical enrichment without disrupting live workloads. With proper isolation, idempotency safeguards, and monitoring, replay becomes a controlled and essential tool in enterprise-grade event-driven architectures.