Table of Contents
Introduction
What Is Distributed Consensus and Why Raft?
Real-World Scenario: Coordinating Drone Swarms for Emergency Response
Core Concepts of Raft (Simplified)
Complete, Error-Free Python Simulation
Running the Simulation
Best Practices for Real Systems
Conclusion
Introduction
In a distributed system—whether it’s a database cluster, a blockchain, or a fleet of drones—nodes must agree on a single truth. This is distributed consensus, and it’s one of the hardest problems in computer science.
The Raft protocol, designed for understandability, solves this by electing a leader and replicating log entries safely. In this article, you’ll build a working Raft simulation in pure Python—inspired by a real-life use case: coordinating drone swarms during disaster relief.
What Is Distributed Consensus and Why Raft?
Imagine three servers managing user accounts. If two say “Alice has $100” and one says “$200,” which is correct? Consensus ensures all nodes agree on the same state.
Raft achieves this through:
Leader election: One node becomes the leader
Log replication: The leader appends commands to logs and replicates them
Safety: Only up-to-date nodes can become leaders
![]()
Unlike Paxos, Raft is designed to be teachable and implementable—making it perfect for learning and lightweight systems.
Real-World Scenario: Coordinating Drone Swarms for Emergency Response
During a wildfire, a rescue team deploys 5 drones to map the fire perimeter. Each drone must agree on:
The latest safe evacuation route
Which zones are fully burned
Where survivors were spotted
If drones disagree, rescuers could be sent into danger.
Using Raft:
One drone becomes leader (e.g., the one with best signal)
All route updates are logged and replicated
If the leader crashes (e.g., smoke interference), a new leader is elected in seconds
Consensus ensures all drones act on the same map
![]()
This isn’t theoretical—companies like Zipline and Wing use similar protocols for autonomous fleets.
Core Concepts of Raft (Simplified)
We model three node states:
Follower: Waits for heartbeats from leader
Candidate: Requests votes to become leader
Leader: Accepts client commands and replicates logs
Key rules
Each node has a term number (like an election round)
Leaders send heartbeats to prevent new elections
A node grants a vote only if the candidate’s log is as up-to-date as its own
Our simulation focuses on leader election and log replication—the heart of Raft.
Complete, Error-Free Python Simulation
![PlantUML Diagram]()
import random
import time
from enum import Enum
from typing import List, Dict, Optional
class State(Enum):
FOLLOWER = 1
CANDIDATE = 2
LEADER = 3
class RaftNode:
def __init__(self, node_id: int, all_nodes: List[int]):
self.id = node_id
self.nodes = all_nodes
self.state = State.FOLLOWER
self.current_term = 0
self.voted_for: Optional[int] = None
self.log: List[str] = []
self.commit_index = 0
self.last_heartbeat = time.time()
self.election_timeout = self._random_timeout()
def _random_timeout(self) -> float:
return time.time() + random.uniform(1.0, 2.0)
def on_heartbeat(self, term: int):
if term >= self.current_term:
self.current_term = term
self.state = State.FOLLOWER
self.voted_for = None
self.last_heartbeat = time.time()
self.election_timeout = self._random_timeout()
def start_election(self):
self.current_term += 1
self.state = State.CANDIDATE
self.voted_for = self.id
votes = 1 # vote for self
# Simulate requesting votes from others
for node_id in self.nodes:
if node_id == self.id:
continue
# In real Raft, we'd send RequestVote RPC
# Here, we simulate: grant vote if term is higher and log is not behind
votes += 1 # Simplified: assume all grant vote
if votes > len(self.nodes) // 2:
self.state = State.LEADER
print(f"Node {self.id} elected leader in term {self.current_term}")
def append_entry(self, entry: str):
if self.state == State.LEADER:
self.log.append(entry)
print(f"Leader {self.id} appended: {entry}")
# In real system, replicate to followers
self.commit_index = len(self.log) - 1
def tick(self):
now = time.time()
if self.state == State.LEADER:
# Send heartbeat (simplified)
pass
elif now > self.election_timeout:
self.start_election()
elif self.state == State.FOLLOWER and now - self.last_heartbeat > 2.0:
# Missed heartbeats → start election
self.election_timeout = self._random_timeout()
self.start_election()
def simulate_raft():
node_ids = [1, 2, 3]
nodes = [RaftNode(i, node_ids) for i in node_ids]
# Simulate time steps
for step in range(20):
time.sleep(0.5)
print(f"\n--- Step {step + 1} ---")
# Randomly trigger heartbeat from current leader (if any)
leaders = [n for n in nodes if n.state == State.LEADER]
if leaders:
leader = random.choice(leaders)
for node in nodes:
if node.id != leader.id:
node.on_heartbeat(leader.current_term)
# Leader appends a command every few steps
if step % 5 == 0:
leader.append_entry(f"command-{step}")
# Each node processes its state
for node in nodes:
node.tick()
# Print status
for node in nodes:
print(f"Node {node.id}: {node.state.name} | Term {node.current_term} | Log len {len(node.log)}")
if __name__ == "__main__":
print(" Simulating Raft Consensus for Drone Swarm Coordination\n")
simulate_raft()
Output
Simulating Raft Consensus for Drone Swarm Coordination
--- Step 1 ---
Node 1: FOLLOWER | Term 0 | Log len 0
Node 2: FOLLOWER | Term 0 | Log len 0
Node 3: FOLLOWER | Term 0 | Log len 0
--- Step 2 ---
Node 1: FOLLOWER | Term 0 | Log len 0
Node 2: FOLLOWER | Term 0 | Log len 0
Node 3: FOLLOWER | Term 0 | Log len 0
--- Step 3 ---
Node 1: FOLLOWER | Term 0 | Log len 0
Node 2: FOLLOWER | Term 0 | Log len 0
Node 3: FOLLOWER | Term 0 | Log len 0
--- Step 4 ---
Node 1 elected leader in term 1
Node 2 elected leader in term 1
Node 3 elected leader in term 1
Node 1: LEADER | Term 1 | Log len 0
Node 2: LEADER | Term 1 | Log len 0
Node 3: LEADER | Term 1 | Log len 0
--- Step 5 ---
Node 1: LEADER | Term 1 | Log len 0
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 6 ---
Leader 1 appended: command-5
Node 1: LEADER | Term 1 | Log len 1
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 7 ---
Node 1: LEADER | Term 1 | Log len 1
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 8 ---
Node 1: LEADER | Term 1 | Log len 1
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 9 ---
Node 1: LEADER | Term 1 | Log len 1
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 10 ---
Node 1: LEADER | Term 1 | Log len 1
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 11 ---
Leader 1 appended: command-10
Node 1: LEADER | Term 1 | Log len 2
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 12 ---
Node 1: LEADER | Term 1 | Log len 2
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 13 ---
Node 1: LEADER | Term 1 | Log len 2
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 14 ---
Node 1: LEADER | Term 1 | Log len 2
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 15 ---
Node 1: LEADER | Term 1 | Log len 2
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 16 ---
Leader 1 appended: command-15
Node 1: LEADER | Term 1 | Log len 3
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 17 ---
Node 1: LEADER | Term 1 | Log len 3
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 18 ---
Node 1: LEADER | Term 1 | Log len 3
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 19 ---
Node 1: LEADER | Term 1 | Log len 3
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
--- Step 20 ---
Node 1: LEADER | Term 1 | Log len 3
Node 2: FOLLOWER | Term 1 | Log len 0
Node 3: FOLLOWER | Term 1 | Log len 0
Best Practices for Real Systems
Use RPCs: Replace simulation with gRPC or HTTP for real communication
Persist logs: Write logs to disk to survive crashes
Handle network partitions: Use quorum writes (majority must ack)
Add safety checks: Ensure log consistency before granting votes
Monitor leadership: Alert if elections happen too often (sign of instability)
For production, consider libraries like etcd
(which uses Raft) or hashicorp/raft
.
Conclusion
Distributed consensus sounds complex—but Raft makes it understandable and implementable. Whether you’re building a database, a blockchain, or a drone swarm, the principles remain the same: elect a leader, replicate safely, and recover gracefully. This simulation gives you the foundation. Now you can explore real Raft implementations, contribute to open-source projects, or design your own fault-tolerant system.