Python  

Watchdog in Python: A Technical Guide to File System Monitoring

Introduction

Modern backend systems often require reactive file processing - whenever a file is created, modified, deleted, or moved, the system should respond automatically. In Python, one of the most reliable libraries for this purpose is watchdog.

This article provides a technical deep dive into:

  • What watchdog is

  • How it works internally

  • Supported events

  • Architecture

  • Production best practices

  • Advanced usage patterns

What is Watchdog?

watchdog is a Python library that monitors file system events and triggers callbacks when changes occur.

It allows you to build:

  • File ingestion pipelines

  • Automated document processors

  • ETL triggers

  • Log monitoring systems

  • AI workflow triggers

  • Event-driven backend systems

Install it:

pip install watchdog

Why Watchdog Instead of Polling?

Without watchdog, you might check for file changes like this:

while True:
    check_folder()
    sleep(5)

This is inefficient because:

  • It wastes CPU cycles

  • It adds latency

  • It doesn't scale well

Watchdog uses OS-native file system notification APIs, making it:

  • Event-driven

  • Efficient

  • Low-latency

  • Scalable

How Watchdog Works Internally

Watchdog connects to the operating system’s file notification mechanism:

OSNative Mechanism
WindowsReadDirectoryChangesW
Linuxinotify
macOSFSEvents

Architecture:

Operating System - Native FS Event API - Watchdog Observer - Event Handler - Your Business Logic

When the OS detects a change, watchdog immediately dispatches the event to your handler.

Core Components

Watchdog has three main components:

Observer

Monitors file system events.

from watchdog.observers import Observer

Event Handler

Handles events like create, modify, delete, move.

from watchdog.events import FileSystemEventHandler

Scheduler

Links observer and handler to a specific directory.

Basic Example

import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class FolderEventHandler(FileSystemEventHandler):
    def on_created(self, event):
        # Ignore directories
        if not event.is_directory:
            print(f"[EVENT] New file detected: {event.src_path}")
            trigger_message(event.src_path)

def trigger_message(file_path):
    # Your custom logic here
    print(f"[ACTION] Processing file: {file_path}")

if __name__ == "__main__":
    path_to_watch = "./watch_folder"

    event_handler = FolderEventHandler()
    observer = Observer()
    observer.schedule(event_handler, path_to_watch, recursive=False)

    observer.start()
    print(f"Watching folder: {path_to_watch}")

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()

    observer.join()

Available Event Types

You can override:

MethodTrigger
on_createdFile or folder created
on_modifiedFile or folder modified
on_deletedFile or folder deleted
on_movedFile moved or renamed
on_any_eventAny event

Event object contains:

event.src_path
event.is_directory
event.event_type

For move events:

event.dest_path

Production Considerations

Duplicate Events

When copying a file, you may receive:

  • created

  • multiple modified

Solution:

  • Track last modified timestamp

  • Debounce events

File Not Fully Written

Large files may trigger events before writing completes.

Solution:

import time
import os

def wait_until_stable(file_path):
    previous_size = -1
    while True:
        current_size = os.path.getsize(file_path)
        if current_size == previous_size:
            break
        previous_size = current_size
        time.sleep(1)

Ignore Temporary Files

Many systems create temp files like:

  • .tmp

  • .swp

  • ~filename

Filter them before processing.

Monitoring Local vs Remote Systems

Watchdog primarily monitors:

  • Local file systems

It may work with:

  • Mounted network drives (NFS, SMB)

It does NOT directly monitor:

  • AWS S3

  • Azure Blob

  • Google Cloud Storage

  • SharePoint Online

For those, use cloud-native event systems.

Advanced Architecture: Event-Driven File Processing

Watchdog integrates well into event-driven systems.

Example architecture:

File Created - Watchdog Event - Event Bus (Kafka / Redis / RabbitMQ) - Processing Service - Database / AI / Storage

This pattern enables:

  • Horizontal scalability

  • Loose coupling

  • Async processing

  • Observability

Async Integration

Watchdog itself is synchronous, but you can integrate it with async frameworks by:

  • Publishing events to a queue

  • Using background workers

  • Dispatching to async tasks

Example concept:

Watchdog - Queue - Async Worker - LLM Processing

When to Use Watchdog

Use it when:

  • Monitoring local directories

  • Building ingestion pipelines

  • Triggering workflows on file changes

  • Automating document processing

  • Creating event-driven backend systems

Avoid it when:

  • Monitoring cloud object storage directly

  • You need distributed remote monitoring

  • System is very simple and polling is sufficient

Watchdog vs Polling

FeaturePollingWatchdog
CPU usageHigherLow
LatencyDelayedImmediate
ScalabilityPoorGood
ArchitectureLoop-basedEvent-driven

Summary

Watchdog is a:

  • Lightweight

  • Event-driven

  • OS-level file monitoring library

  • Ideal for backend automation and ingestion systems

It enables reactive systems where file system changes automatically trigger workflows.