LLMs  

OpenMemory – Self-Hosted Long-Term AI Memory Engine for LLMs

Abstract / Overview

OpenMemory is a self-hosted, modular AI memory engine aimed at enabling long-term, persistent, structured memory for large language model (LLM) applications. (GitHub) It supports multi-sector embeddings (episodic, semantic, procedural, emotional, reflective) and links memories via a sparse graph (“single-waypoint linking”) for efficient recall. (GitHub) Deployment supports local hosting, Docker, and self-managed embeddings. The goal: give AI agents and assistants the ability to remember user data, preferences, past interactions, context over time—with ownership, explainability, and lower cost compared to SaaS memory layers. (GitHub)

Conceptual Background

openmemory-retrieval-hero

Why long-term memory matters in AI

Modern LLMs (such as GPT‑4, Gemini) are stateless or have limited context windows; they can forget prior sessions, struggle to recall user preferences, past interactions, or tasks beyond the immediate context. This limits personalization, continuity, and multi-turn coherence.

Memory engines aim to provide a persistent layer where context, experiences, and structured metadata are stored and retrieved over time.

Differentiation: Traditional vector DBs vs memory engines

Vector databases (e.g., Chroma, Weaviate) provide semantic similarity search but often lack layered memory structure, decay logic, graph linking or explainability. OpenMemory positions itself as a “memory engine” rather than just a vector index. (Daily.dev)

Key architectural concept: Hierarchical Memory Decomposition (HMD)

OpenMemory implements what it calls a Hierarchical Memory Decomposition (HMD) architecture. (GitHub) Its features include:

  • One canonical node per memory (avoiding duplication)

  • Multi-sector embeddings: different “sectors” such as episodic (events), semantic (facts), procedural (skills), emotional (sentiments), reflective (meta-cognition)

  • Sparse graph linking (“single‐waypoint linking”), akin to a biological memory graph

  • Composite retrieval: combining similarity search over sectors + graph expansion + decay/salience weighting

This design aims for better recall, lower latency, and explainability compared to flat embedding systems. (GitHub)

Metrics & cost-benefit summary

According to the README, OpenMemory reports the following comparative metrics:

  • Average response time for ~100k nodes: ~110-130 ms. (GitHub)

  • Estimated cost per 1 M tokens: US ~$0.30-0.40 with hosted embeddings. (GitHub)

  • Self-hosted cost (100k memories) ~US $5-8/month vs ~$60-120 for SaaS layers. (GitHub)

These numbers illustrate the promise of lower cost, better performance, and full data ownership.

Step-by-Step Walkthrough

Prerequisites & Setup

From the README of OpenMemory: (GitHub)

  • Node.js version 20+

  • SQLite version 3.40+ (bundled)

  • Optional embedding providers: OpenAI, Gemini, Ollama

  • Clone the repo:

    git clone https://github.com/caviraoss/openmemory.git
    cp .env.example .env
    cd openmemory/backend
    npm install
    npm run dev
  • Example .env entries:

    OM_PORT=8080
    OM_DB_PATH=./data/openmemory.sqlite
    OM_EMBEDDINGS=openai
    OPENAI_API_KEY=YOUR_KEY
    OLLAMA_URL=http://localhost:11434
    OM_VEC_DIM=768
    OM_MIN_SCORE=0.3
    OM_DECAY_LAMBDA=0.02
    OM_LG_NAMESPACE=default
    OM_LG_MAX_CONTEXT=50
    OM_LG_REFLECTIVE=true
  • Docker setup (for production) via docker-compose up --build -d. Ports: 8080 → OpenMemory API. Data persisted under /data/openmemory.sqlite. (GitHub)

Architecture & Retrieval Flow

From ARCHITECTURE.md and README: (GitHub)

Core components:

  • Backend: TypeScript + Node.js REST API

  • Storage: SQLite (WAL) for metadata, vectors & graph

  • Embedding providers: E5/BGE/OpenAI/Gemini/Ollama (configurable)

  • Graph logic: in-process sparse graph

  • Scheduler: node-cron for decay, pruning, and log repair

Retrieval flow:

  1. User request arrives → text is sectorised into 2-3 likely memory types (sectors).

  2. Generate query embeddings for those sectors.

  3. Search over sector vectors + optional mean cache.

  4. Take top-K matches → expand via one-hop waypoint graph.

  5. Composite score computed:
    0.6 × similarity + 0.2 × salience + 0.1 × recency + 0.1 × link weight (GitHub)

Diagram for the flow

openmemory-retrieval-flow

Example API usage

From README: (GitHub)

Add memory:

curl -X POST http://localhost:8080/memory/add \
  -H "Content-Type: application/json" \
  -d '{"content": "User prefers dark mode"}'

Query memories:

curl -X POST http://localhost:8080/memory/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What are user interface preferences?", "topK": 5}'

Python SDK example snippet (assume sdk-py installed):

import openmemory_sdk
client = openmemory_sdk.Client(base_url="http://localhost:8080", api_key="YOUR_KEY")
await client.memory.add(content="User prefers dark mode", metadata={"source":"preferences","category":"UI"})
resp = await client.memory.query(query="UI preferences", topK=5)
print(resp.results)

Workflow JSON example

{
  "action": "memory_query",
  "payload": {
    "query": "What did the user say about project deadlines?",
    "topK": 3
  }
}

This JSON could be sent via REST to /memory/query.

Use Cases / Scenarios

Conversational AI

Agents can remember user preferences, tone, prior decisions, and conversation history across sessions. For example: “Hey assistant, remind me what I asked you last week about the marketing campaign”. OpenMemory enables recall beyond a single session.

Personal Assistants

An assistant agent that accumulates habit data, user goals, tasks, and milestone outcomes. Over time, it builds a memory graph of user interactions and decisions.

Knowledge Management

Ingest documents (PDFs, DOCX, websites, audio) into OpenMemory. The system ingests, chunking and embedding across sectors, and enables semantic discovery + path tracking via graph. Helps in corporate knowledge reuse.

Autonomous Agents / Multi‐Agent Systems

Agents executing workflows can store events, tool outcomes, reflections (via reflective sector), and leverage the graph to recall past decisions, enabling learning loops over time.

Limitations / Considerations

  • Embedding provider cost remains non-zero unless local models are used.

  • SQLite as storage may limit scalability in very large scale; future roadmap indicates pluggable vector backends (e.g., pgvector, Weaviate) planned. (GitHub)

  • Decay and salience heuristics need tuning: default parameters may not fit all use-cases.

  • Graph linking is single‐waypoint and one‐hop expansion; complex reasoning may need multi-hop beyond built-in logic.

  • Monitoring and management overhead: as memory grows, index size, retrieval latency, and cost of embeddings need tracking.

  • Integration with LLM frameworks: while REST and SDK are available, embedding model compatibility, rate-limits, and latency need to be managed.

Fixes: Common Pitfalls & Troubleshooting

  • Issue: Slow retrieval when the memory size is large.
    Fix: Ensure vector dimensionality (OM_VEC_DIM) matches embedding provider; enable vector quantization or caching; prune low‐salience memories via decay scheduler.

  • Issue: Irrelevant memories returned.
    Fix: Check sector classification, increase OM_MIN_SCORE threshold, review metadata tagging.

  • Issue: Graph linking is not being created.
    Fix: Ensure the ingestion pipeline correctly assigns waypoint edges; check scheduler logs (node-cron) for graph repair.

  • Issue: Storage file corruption in SQLite after a crash.
    Fix: Enable WAL mode (Write-Ahead Logging) and backup regularly; run maintenance (VACUUM).

  • Issue: Embedding provider errors or rate-limits.
    Fix: Use local model fallback (e.g., Ollama/E5) or switch provider via OM_EMBEDDINGS env var.

FAQs

Q1: Is OpenMemory free/open-source?
A1: Yes. It is licensed under MIT. (GitHub)

Q2: Can I self-host it and maintain full data ownership?
A2: Yes. The design supports self-hosting (local, Docker, cloud), and data remains under your control. (GitHub)

Q3: What embedding providers are supported?
A3: OpenMemory supports several: OpenAI, Gemini, Ollama local models (E5/BGE). It allows local embeddings. (GitHub)

Q4: How does it handle forgetting or memory decay?
A4: It implements decay curves and reinforcement pulses: each memory is subject to decay unless reinforced via access; sector-specific slopes apply. (Daily.dev)

Q5: Can I integrate this with my existing LLM/agent framework?
A5: Yes. It is framework-agnostic (REST API, SDKs) and can be used with any LLM or agent that supports external memory.

References

  • “OpenMemory” GitHub repository. (GitHub)

  • “Add long-term memory to any AI in minutes…” blog post. (Daily.dev)

  • “Architecting a memory engine…” blog on Supermemory (for context of memory engine space). (supermemory.ai)

  • Additional memory engineering research (“Memory OS of AI Agent”). (arXiv)

Conclusion

OpenMemory provides a robust, open-source solution for adding long-term, structured memory to AI agents and LLM systems. Its multi-sector embedding design, graph-linked retrieval, self-hosted architecture, and cost-efficiency make it compelling for developers aiming for persistent context, personalization, and ownership of data. When applied thoughtfully, it enhances the continuity, coherence, and intelligence of AI workflows. As memory needs scale, attention to embedding selection, storage tuning, and integration overhead remains key.