Introduction
Modern backend systems often require reactive file processing - whenever a file is created, modified, deleted, or moved, the system should respond automatically. In Python, one of the most reliable libraries for this purpose is watchdog.
This article provides a technical deep dive into:
What is Watchdog?
watchdog is a Python library that monitors file system events and triggers callbacks when changes occur.
It allows you to build:
Install it:
pip install watchdog
Why Watchdog Instead of Polling?
Without watchdog, you might check for file changes like this:
while True:
check_folder()
sleep(5)
This is inefficient because:
It wastes CPU cycles
It adds latency
It doesn't scale well
Watchdog uses OS-native file system notification APIs, making it:
Event-driven
Efficient
Low-latency
Scalable
How Watchdog Works Internally
Watchdog connects to the operating system’s file notification mechanism:
| OS | Native Mechanism |
|---|
| Windows | ReadDirectoryChangesW |
| Linux | inotify |
| macOS | FSEvents |
Architecture:
Operating System - Native FS Event API - Watchdog Observer - Event Handler - Your Business Logic
When the OS detects a change, watchdog immediately dispatches the event to your handler.
Core Components
Watchdog has three main components:
Observer
Monitors file system events.
from watchdog.observers import Observer
Event Handler
Handles events like create, modify, delete, move.
from watchdog.events import FileSystemEventHandler
Scheduler
Links observer and handler to a specific directory.
Basic Example
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class FolderEventHandler(FileSystemEventHandler):
def on_created(self, event):
# Ignore directories
if not event.is_directory:
print(f"[EVENT] New file detected: {event.src_path}")
trigger_message(event.src_path)
def trigger_message(file_path):
# Your custom logic here
print(f"[ACTION] Processing file: {file_path}")
if __name__ == "__main__":
path_to_watch = "./watch_folder"
event_handler = FolderEventHandler()
observer = Observer()
observer.schedule(event_handler, path_to_watch, recursive=False)
observer.start()
print(f"Watching folder: {path_to_watch}")
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
Available Event Types
You can override:
| Method | Trigger |
|---|
on_created | File or folder created |
on_modified | File or folder modified |
on_deleted | File or folder deleted |
on_moved | File moved or renamed |
on_any_event | Any event |
Event object contains:
event.src_path
event.is_directory
event.event_type
For move events:
event.dest_path
Production Considerations
Duplicate Events
When copying a file, you may receive:
Solution:
File Not Fully Written
Large files may trigger events before writing completes.
Solution:
import time
import os
def wait_until_stable(file_path):
previous_size = -1
while True:
current_size = os.path.getsize(file_path)
if current_size == previous_size:
break
previous_size = current_size
time.sleep(1)
Ignore Temporary Files
Many systems create temp files like:
Filter them before processing.
Monitoring Local vs Remote Systems
Watchdog primarily monitors:
It may work with:
It does NOT directly monitor:
AWS S3
Azure Blob
Google Cloud Storage
SharePoint Online
For those, use cloud-native event systems.
Advanced Architecture: Event-Driven File Processing
Watchdog integrates well into event-driven systems.
Example architecture:
File Created - Watchdog Event - Event Bus (Kafka / Redis / RabbitMQ) - Processing Service - Database / AI / Storage
This pattern enables:
Horizontal scalability
Loose coupling
Async processing
Observability
Async Integration
Watchdog itself is synchronous, but you can integrate it with async frameworks by:
Example concept:
Watchdog - Queue - Async Worker - LLM Processing
When to Use Watchdog
Use it when:
Monitoring local directories
Building ingestion pipelines
Triggering workflows on file changes
Automating document processing
Creating event-driven backend systems
Avoid it when:
Monitoring cloud object storage directly
You need distributed remote monitoring
System is very simple and polling is sufficient
Watchdog vs Polling
| Feature | Polling | Watchdog |
|---|
| CPU usage | Higher | Low |
| Latency | Delayed | Immediate |
| Scalability | Poor | Good |
| Architecture | Loop-based | Event-driven |
Summary
Watchdog is a:
It enables reactive systems where file system changes automatically trigger workflows.