Abstract / Overview
Event Deep Research is an open-source project by Bernat Sampera that automates the research of historical figures and outputs structured event timelines in JSON. (GitHub) The system uses a multi-agent orchestration framework (LangGraph) and supports various LLMs and crawling/search tools. This guide describes its purpose, architecture, setup, usage, code structure, use-cases, limitations, troubleshooting, and FAQs.
Conceptual Background
![Event-Deep-Research]()
Why structured event timelines?
Historical data often exists as unstructured text (biographies, articles). Converting to a structured form enables programmatic analysis, search, filtering, and visualization.
An event timeline with fields like name, date, location, description, and id allows building interactive apps, knowledge graphs, and time-based visualizations. For example:
{
"name": "Birth in Ulm",
"description": "Albert Einstein was born in Ulm, Germany to Hermann and Pauline Einstein",
"date": {"year":1879,"note":"March 14"},
"location":"Ulm, German Empire",
"id":"time-1879-03-14T00:00:00Z"
}
``` :contentReference[oaicite:5]{index=5}
Multi-agent orchestration frameworks like LangGraph allow decomposition of workflows (scoping → research → merge → output) improving modularity and scalability.
Key components & technologies
LangGraph: A workflow/agent orchestration system built on top of LangChain concepts. It describes graphs of nodes (agents/tools) and flows. (LangChain Blog)
LangChain: Provides LLM/agent primitives. The project uses LangChain for LLM integration. (GitHub)
Crawling/Search tools: The repo supports tools like Firecrawl (web crawler) and Tavily (search API) to gather sources. (GitHub)
Structured JSON output: Each event is normalized into JSON with a consistent schema (name, description, date object, location, id).
Typical workflow (conceptual)
![event-deep-research-workflow]()
Step-by-Step Walkthrough
Installation
Clone repository: git clone https://github.com/bernatsampera/event-deep-research.git
(GitHub)
Change directory: cd event-deep-research
Create virtual environment & install dependencies (requires Python 3.12+):
uv venv && source .venv/bin/activate
uv sync
``` :contentReference[oaicite:12]{index=12}
Copy example environment file: cp .env.example .env
Set API keys in .env
: FIRECRAWL_BASE_URL, FIRECRAWL_API_KEY, TAVILY_API_KEY, OPENAI_API_KEY/ANTHROPIC_API_KEY/GOOGLE_API_KEY depending on model. (GitHub)
Usage
Via LangGraph Studio (recommended):
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.12 langgraph dev --allow-blocking
Then open http://localhost:2024
and select the supervisor
graph. Input:
{
"person_to_research": "Albert Einstein"
}
``` :contentReference[oaicite:14]{index=14}
The agent will run and produce structured JSON events as output. Example output snippet shown above.
Configuration
Open configuration.py
. Key parameters:
llm_model
: primary LLM model used (OpenAI, Anthropic, Google, local).
structured_llm_model
, tools_llm_model
, chunk_llm_model
: override models for specific tasks. (GitHub)
Token and iteration limits: structured_llm_max_tokens
, max_tool_iterations
, max_chunks
, etc.
Architecture / Internals
Supervisor Agent: orchestrates workflow, decides which agent/tool to call next. (GitHub)
Research Agent: identifies relevant sources, delegates crawling and merges. (GitHub)
URL Crawler: uses Firecrawl to fetch and extract content from web pages. (GitHub)
Merge Agent: deduplicates and merges event data from multiple sources to produce clean structured output. (samperalabs)
Code / JSON Snippets
Sample input JSON
{
"person_to_research": "Marie Curie"
}
Sample output JSON structure
{
"structured_events": [
{
"name": "Birth of Marie Curie",
"description": "Marie Curie was born on November 7, 1867 in Warsaw, Poland.",
"date": {"year":1867, "note":"November 7"},
"location": "Warsaw, Congress Poland",
"id": "time-1867-11-07T00:00:00Z"
},
{
"name": "Nobel Prize in Physics",
"description": "Marie Curie awarded Nobel Prize in Physics for research on radioactive substances.",
"date": {"year":1903, "note":""},
"location": "Stockholm, Sweden",
"id": "time-1903-01-01T00:00:00Z"
}
// … more events …
]
}
Minimal Python snippet to run via CLI
# index.py (simplified)
import json
from langgraph import run_graph
def run_research(person: str):
input_payload = {"person_to_research": person}
result = run_graph("supervisor", input_payload)
print(json.dumps(result, indent=2))
if __name__ == "__main__":
import sys
person = sys.argv[1]
run_research(person)
Use Cases / Scenarios
Academic historians: quickly build event timelines for figures for teaching or publication.
Knowledge graph builders: import JSON events into graph DBs for linking entities and temporal queries.
Content creators: script timelines for documentaries, podcasts, or interactive web pages.
Business intelligence: adapt for corporate historical analysis (e.g., competitor history, market evolution).
Data-driven journalism: embed structured event timelines in articles for better reader interaction.
Limitations / Considerations
Model correctness/hallucination risk: LLMs may generate incorrect or fabricated events; vet output.
Source reliability: The crawler/search tools fetch web data, which may include low-quality or biased sources.
Date resolution issues: Some events may have approximate dates (year only) or ambiguous times.
Coverage bias: The agent may miss obscure figures or non-English sources.
Token and cost constraints: Large workloads (many events, deep web crawling) may incur cost/time.
Merge complexity: Deduplicating across multiple sources remains an open challenge (not fully automated).
Real-time vs static: The timeline represents a snapshot at runtime; it won’t update automatically unless re-run.
Fixes (Common pitfalls + solutions)
Issue | Fix |
---|
No output or empty structured_events | Ensure .env keys are set and LLM/tool keys are valid; check logs for errors. |
Duplicate events or overlapping timelines | Adjust merge deduplication thresholds in configuration; increase source diversity. |
Very long latency | Limit max_chunks or max_tool_iterations; restrict to recent sources; run locally with a smaller model. |
Poor date parsing (note fields blank) | Pre-filter sources for reliable biographical references; refine chunk size/overlap. |
Coverage is missing non-English content | Extend the crawler to support additional languages; supply a translation tool or model. |
FAQs
Q: Which models are supported?
A: The system supports OpenAI, Anthropic, Google, or local models (e.g., Ollama) via configuration. (GitHub)
Q: Can I research topics beyond “people”?
A: While optimized for historical figures, you can adapt input to any entity (company, event, concept), provided you update prompt logic.
Q: How is deduplication handled?
A: The Merge Agent combines events from multiple sources and applies heuristics to deduplicate; still best to review the output manually.
Q: Is there a GUI or UI for output visualization?
A: The repo recommends using LangGraph Studio for monitoring the agent workflow. Visual output is JSON; separate tools are required for visualization.
Q: Can I add images to events?
A: Roadmap includes “Add images to relevant events”. It is planned but not fully implemented. (GitHub)
References
GitHub Repo: bernatsampera/event-deep-research. (GitHub)
LangChain Blog: “Open Deep Research” overview. (LangChain Blog)
Blog analysis of the project by the author. (Samperalabs)
Conclusion
Event Deep Research offers a potent open-source framework for converting biographical information into structured timelines. Its modular architecture leverages LangGraph orchestration, crawlers, and LLMs to deliver JSON-formatted events ready for integration. While limitations around accuracy, coverage, and deduplication remain, the tool provides a strong base for historians, developers, and content creators. With customization and careful configuration, it can accelerate timeline creation and enable downstream uses like visualization, analytics, and knowledge-graph ingestion.