Introduction
Adding read replicas feels like a clean scaling move. Reads go to replicas, writes stay on the primary, and everyone expects instant relief. This works in demos and early stages.
In real production systems, teams often add replicas and see only small improvements — or none at all. Sometimes things even get worse. Latency becomes inconsistent. Replication lag grows. Failovers feel riskier.
This article explains why read replicas often fail to reduce load as teams expect, what engineers typically see in production, and why the disappointment feels confusing and sudden.
The Promise vs the Reality
The promise is simple: offload read traffic.
The reality is messier. Most production workloads are not cleanly separated into read and write operations. Reads depend on fresh data. Writes trigger reads. Background jobs are read right after writing.
A real-world analogy: imagine opening a second customer support desk, but all customers still need approval from the same manager. The desk helps a bit, but the bottleneck stays.
In PostgreSQL, the primary often remains the bottleneck.
What Read Replicas Actually Do
Read replicas replay WAL from the primary and expose a read-only view of the data.
They do not:
Reduce write load on the primary
Eliminate WAL generation
Remove index maintenance costs
Avoid VACUUM pressure on the primary
They mainly help when a large portion of traffic is tolerant of slightly stale data.
What Developers Usually See in Production
After adding replicas, teams report:
Primary CPU still high
Write latency unchanged
Replication lag increasing under load
Some reads still forced to primary
Inconsistent read performance
This leads to the uncomfortable question: “What did the replicas actually help with?”
Why the Improvement Feels Smaller Than Expected
Several hidden limits appear.
First, many reads cannot move. Anything requiring fresh data, transactions, or strong consistency still hits the primary.
Second, replicas consume resources too. They replay WAL, run queries, and compete for I/O.
Third, applications often fall back to the primary when replicas lag or time out.
The result is partial offload at best.
Replication Lag Becomes the New Bottleneck
As read traffic increases on replicas, replay slows down.
Lag grows because:
Once lag grows, replicas become less usable. Applications route more reads back to the primary, defeating the purpose.
Real-World Example
A product adds two read replicas to handle dashboard traffic. Initially, load drops slightly. As usage grows, dashboards run heavy queries on replicas.
Replication lag increases. Fresh reads are routed back to the primary. During peak hours, the primary is overloaded again, but now with extra replication pressure.
The system is more complex, not faster.
Advantages and Disadvantages of Read Replicas
Advantages (When Workloads Fit)
When workloads are replica-friendly:
Read-heavy traffic is offloaded
Primary stability improves
Scaling becomes incremental
Failover options improve
Cost efficiency increases
In these cases, replicas shine.
Disadvantages (When Assumptions Are Wrong)
When replicas are added blindly:
At that point, replicas feel like overhead.
How Teams Should Think About This
Read replicas are not a universal scaling tool. They are a workload-specific optimization.
Teams should stop asking:
“Can we add replicas?”
And start asking:
Which reads can safely be stale?
How much WAL will this generate?
What happens when replicas lag?
Scaling reads without understanding writes rarely works.
Simple Mental Checklist
Before adding or relying on replicas, check:
What percentage of reads require fresh data?
How heavy is write and WAL volume?
Can replicas keep up under peak load?
What is the fallback behavior?
Are lag and routing clearly visible?
These checks prevent scaling illusions.
Summary
Read replicas often disappoint because most production load still depends on the primary. Replication lag, consistency requirements, and write amplification limit how much traffic can be offloaded. The slowdown feels surprising because replicas promise simplicity, but expose deeper workload constraints. Teams that align replica usage with real access patterns get value; others inherit complexity without relief.