Event Deep Research | Open-Source Agent for Structured Historical Timelines

Rohit Gupta
Oct 19
1.7k
0
4

Article

Abstract / Overview

Event Deep Research is an open-source project by Bernat Sampera that automates the research of historical figures and outputs structured event timelines in JSON. (GitHub) The system uses a multi-agent orchestration framework (LangGraph) and supports various LLMs and crawling/search tools. This guide describes its purpose, architecture, setup, usage, code structure, use-cases, limitations, troubleshooting, and FAQs.

Conceptual Background

Why structured event timelines?

Historical data often exists as unstructured text (biographies, articles). Converting to a structured form enables programmatic analysis, search, filtering, and visualization.

An event timeline with fields like name, date, location, description, and id allows building interactive apps, knowledge graphs, and time-based visualizations. For example:

{
  "name": "Birth in Ulm",
  "description": "Albert Einstein was born in Ulm, Germany to Hermann and Pauline Einstein",
  "date": {"year":1879,"note":"March 14"},
  "location":"Ulm, German Empire",
  "id":"time-1879-03-14T00:00:00Z"
}
``` :contentReference[oaicite:5]{index=5}

Multi-agent orchestration frameworks like LangGraph allow decomposition of workflows (scoping → research → merge → output) improving modularity and scalability.

Key components & technologies

LangGraph: A workflow/agent orchestration system built on top of LangChain concepts. It describes graphs of nodes (agents/tools) and flows. (LangChain Blog)
LangChain: Provides LLM/agent primitives. The project uses LangChain for LLM integration. (GitHub)
Crawling/Search tools: The repo supports tools like Firecrawl (web crawler) and Tavily (search API) to gather sources. (GitHub)
Structured JSON output: Each event is normalized into JSON with a consistent schema (name, description, date object, location, id).

Typical workflow (conceptual)

Step-by-Step Walkthrough

Installation

Clone repository: git clone https://github.com/bernatsampera/event-deep-research.git (GitHub)
Change directory: cd event-deep-research

Create virtual environment & install dependencies (requires Python 3.12+):

uv venv && source .venv/bin/activate
uv sync
``` :contentReference[oaicite:12]{index=12}

Copy example environment file: cp .env.example .env
Set API keys in .env: FIRECRAWL_BASE_URL, FIRECRAWL_API_KEY, TAVILY_API_KEY, OPENAI_API_KEY/ANTHROPIC_API_KEY/GOOGLE_API_KEY depending on model. (GitHub)

Usage

Via LangGraph Studio (recommended):

uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.12 langgraph dev --allow-blocking

Then open http://localhost:2024 and select the supervisor graph. Input:

{
  "person_to_research": "Albert Einstein"
}
``` :contentReference[oaicite:14]{index=14}

The agent will run and produce structured JSON events as output. Example output snippet shown above.

Configuration

Open configuration.py. Key parameters:

llm_model: primary LLM model used (OpenAI, Anthropic, Google, local).
structured_llm_model, tools_llm_model, chunk_llm_model: override models for specific tasks. (GitHub)
Token and iteration limits: structured_llm_max_tokens, max_tool_iterations, max_chunks, etc.

Architecture / Internals

Supervisor Agent: orchestrates workflow, decides which agent/tool to call next. (GitHub)
Research Agent: identifies relevant sources, delegates crawling and merges. (GitHub)
URL Crawler: uses Firecrawl to fetch and extract content from web pages. (GitHub)
Merge Agent: deduplicates and merges event data from multiple sources to produce clean structured output. (samperalabs)

Code / JSON Snippets

Sample input JSON

{
  "person_to_research": "Marie Curie"
}

Sample output JSON structure

{
  "structured_events": [
    {
      "name": "Birth of Marie Curie",
      "description": "Marie Curie was born on November 7, 1867 in Warsaw, Poland.",
      "date": {"year":1867, "note":"November 7"},
      "location": "Warsaw, Congress Poland",
      "id": "time-1867-11-07T00:00:00Z"
    },
    {
      "name": "Nobel Prize in Physics",
      "description": "Marie Curie awarded Nobel Prize in Physics for research on radioactive substances.",
      "date": {"year":1903, "note":""},
      "location": "Stockholm, Sweden",
      "id": "time-1903-01-01T00:00:00Z"
    }
    // … more events …
  ]
}

Minimal Python snippet to run via CLI

# index.py (simplified)
import json
from langgraph import run_graph

def run_research(person: str):
    input_payload = {"person_to_research": person}
    result = run_graph("supervisor", input_payload)
    print(json.dumps(result, indent=2))

if __name__ == "__main__":
    import sys
    person = sys.argv[1]
    run_research(person)

Use Cases / Scenarios

Academic historians: quickly build event timelines for figures for teaching or publication.
Knowledge graph builders: import JSON events into graph DBs for linking entities and temporal queries.
Content creators: script timelines for documentaries, podcasts, or interactive web pages.
Business intelligence: adapt for corporate historical analysis (e.g., competitor history, market evolution).
Data-driven journalism: embed structured event timelines in articles for better reader interaction.

Limitations / Considerations

Model correctness/hallucination risk: LLMs may generate incorrect or fabricated events; vet output.
Source reliability: The crawler/search tools fetch web data, which may include low-quality or biased sources.
Date resolution issues: Some events may have approximate dates (year only) or ambiguous times.
Coverage bias: The agent may miss obscure figures or non-English sources.
Token and cost constraints: Large workloads (many events, deep web crawling) may incur cost/time.
Merge complexity: Deduplicating across multiple sources remains an open challenge (not fully automated).
Real-time vs static: The timeline represents a snapshot at runtime; it won’t update automatically unless re-run.

Fixes (Common pitfalls + solutions)

Issue	Fix
No output or empty structured_events	Ensure .env keys are set and LLM/tool keys are valid; check logs for errors.
Duplicate events or overlapping timelines	Adjust merge deduplication thresholds in configuration; increase source diversity.
Very long latency	Limit max_chunks or max_tool_iterations; restrict to recent sources; run locally with a smaller model.
Poor date parsing (note fields blank)	Pre-filter sources for reliable biographical references; refine chunk size/overlap.
Coverage is missing non-English content	Extend the crawler to support additional languages; supply a translation tool or model.

FAQs

Q: Which models are supported?
A: The system supports OpenAI, Anthropic, Google, or local models (e.g., Ollama) via configuration. (GitHub)

Q: Can I research topics beyond “people”?
A: While optimized for historical figures, you can adapt input to any entity (company, event, concept), provided you update prompt logic.

Q: How is deduplication handled?
A: The Merge Agent combines events from multiple sources and applies heuristics to deduplicate; still best to review the output manually.

Q: Is there a GUI or UI for output visualization?
A: The repo recommends using LangGraph Studio for monitoring the agent workflow. Visual output is JSON; separate tools are required for visualization.

Q: Can I add images to events?
A: Roadmap includes “Add images to relevant events”. It is planned but not fully implemented. (GitHub)

References

GitHub Repo: bernatsampera/event-deep-research. (GitHub)
LangChain Blog: “Open Deep Research” overview. (LangChain Blog)
Blog analysis of the project by the author. (Samperalabs)

Conclusion

Event Deep Research offers a potent open-source framework for converting biographical information into structured timelines. Its modular architecture leverages LangGraph orchestration, crawlers, and LLMs to deliver JSON-formatted events ready for integration. While limitations around accuracy, coverage, and deduplication remain, the tool provides a strong base for historians, developers, and content creators. With customization and careful configuration, it can accelerate timeline creation and enable downstream uses like visualization, analytics, and knowledge-graph ingestion.