PostgreSQL  

Why Read Replicas Don’t Reduce Load as Much as Teams Expect

Introduction

Adding read replicas feels like a clean scaling move. Reads go to replicas, writes stay on the primary, and everyone expects instant relief. This works in demos and early stages.

In real production systems, teams often add replicas and see only small improvements — or none at all. Sometimes things even get worse. Latency becomes inconsistent. Replication lag grows. Failovers feel riskier.

This article explains why read replicas often fail to reduce load as teams expect, what engineers typically see in production, and why the disappointment feels confusing and sudden.

The Promise vs the Reality

The promise is simple: offload read traffic.

The reality is messier. Most production workloads are not cleanly separated into read and write operations. Reads depend on fresh data. Writes trigger reads. Background jobs are read right after writing.

A real-world analogy: imagine opening a second customer support desk, but all customers still need approval from the same manager. The desk helps a bit, but the bottleneck stays.

In PostgreSQL, the primary often remains the bottleneck.

What Read Replicas Actually Do

Read replicas replay WAL from the primary and expose a read-only view of the data.

They do not:

  • Reduce write load on the primary

  • Eliminate WAL generation

  • Remove index maintenance costs

  • Avoid VACUUM pressure on the primary

They mainly help when a large portion of traffic is tolerant of slightly stale data.

What Developers Usually See in Production

After adding replicas, teams report:

  • Primary CPU still high

  • Write latency unchanged

  • Replication lag increasing under load

  • Some reads still forced to primary

  • Inconsistent read performance

This leads to the uncomfortable question: “What did the replicas actually help with?”

Why the Improvement Feels Smaller Than Expected

Several hidden limits appear.

First, many reads cannot move. Anything requiring fresh data, transactions, or strong consistency still hits the primary.

Second, replicas consume resources too. They replay WAL, run queries, and compete for I/O.

Third, applications often fall back to the primary when replicas lag or time out.

The result is partial offload at best.

Replication Lag Becomes the New Bottleneck

As read traffic increases on replicas, replay slows down.

Lag grows because:

  • WAL volume is high

  • Disk I/O on replicas is saturated

  • Long-running queries block replay

Once lag grows, replicas become less usable. Applications route more reads back to the primary, defeating the purpose.

Real-World Example

A product adds two read replicas to handle dashboard traffic. Initially, load drops slightly. As usage grows, dashboards run heavy queries on replicas.

Replication lag increases. Fresh reads are routed back to the primary. During peak hours, the primary is overloaded again, but now with extra replication pressure.

The system is more complex, not faster.

Advantages and Disadvantages of Read Replicas

Advantages (When Workloads Fit)

When workloads are replica-friendly:

  • Read-heavy traffic is offloaded

  • Primary stability improves

  • Scaling becomes incremental

  • Failover options improve

  • Cost efficiency increases

In these cases, replicas shine.

Disadvantages (When Assumptions Are Wrong)

When replicas are added blindly:

  • Complexity increases

  • Replication lag becomes critical

  • Failovers are riskier

  • Debugging becomes harder

  • Performance gains disappoint

At that point, replicas feel like overhead.

How Teams Should Think About This

Read replicas are not a universal scaling tool. They are a workload-specific optimization.

Teams should stop asking:

“Can we add replicas?”

And start asking:

  • Which reads can safely be stale?

  • How much WAL will this generate?

  • What happens when replicas lag?

Scaling reads without understanding writes rarely works.

Simple Mental Checklist

Before adding or relying on replicas, check:

  • What percentage of reads require fresh data?

  • How heavy is write and WAL volume?

  • Can replicas keep up under peak load?

  • What is the fallback behavior?

  • Are lag and routing clearly visible?

These checks prevent scaling illusions.

Summary

Read replicas often disappoint because most production load still depends on the primary. Replication lag, consistency requirements, and write amplification limit how much traffic can be offloaded. The slowdown feels surprising because replicas promise simplicity, but expose deeper workload constraints. Teams that align replica usage with real access patterns get value; others inherit complexity without relief.