← Back to blog
Product

Introducing Orion v0.1 — persistent memory for AI agents

AW
Andy Wu
2026-04-28 · 8 min

Every AI coding agent today has the same problem: it starts from zero. You explain your architecture, your conventions, your constraints — and the moment the context window resets, it's all gone. The next morning, you start over.

We built Orion to fix this. After six months of building, dogfooding, and rebuilding, we're releasing a local-first persistent memory system that gives any MCP-compatible agent structured, searchable, evolving knowledge.

Three API calls — brain.orient, brain.think, brain.recall — are all it takes. The agent orients at session start, thinks to store knowledge, and recalls to retrieve it. Over time, the agent accumulates expertise, detects contradictions in its own knowledge, and builds a graph of every concept it encounters.

The problem we set out to solve

Context windows are getting larger, but they're still ephemeral. A 200K-token window doesn't help when the decision you need was made three weeks ago in a different session. RAG over a codebase gets you file contents, but not the reasoning behind the code — why you chose Postgres over MySQL, why you switched from JWT to session tokens, why that particular retry strategy exists.

We wanted agents that accumulate expertise the way engineers do: gradually, across sessions, with the ability to recall not just facts but the reasoning behind them.

What makes Orion different

Typed knowledge. Every memory record is tagged with a cognitive region — analytical, procedural, contextual, creative, empathetic, critical, or strategic. This isn't cosmetic. Each region has tuned cache TTLs, retrieval weights, and reasoning prompts. When you search for "how to deploy," Orion prioritizes exact procedural matches. When you search for "database decisions," it prioritizes semantically related analytical records. In our benchmarks, region-aware retrieval improved precision@5 by 28%.

Zero-LLM knowledge graph. Every brain.think call extracts entities and typed relationships using regex patterns — no API calls, no GPU, deterministic results. The graph builds automatically: "We switched from Flask to FastAPI" creates a REPLACES edge. "Auth depends on Redis" creates a DEPENDS_ON edge. After a few weeks of use, you have a navigable map of every concept, tool, and decision in your project.

Model continuity. Agent identities persist across model switches. When you go from Claude Opus to Sonnet, Orion detects the switch, assesses continuity (same family = 0.95, cross-family = 0.7), and generates a transition brief. The new model picks up exactly where the old one left off.

Local-first, no exceptions. Orion runs entirely on your machine — Postgres, Redis, ChromaDB, Ollama, the API, and the dashboard, all in Docker. No cloud, no accounts, no telemetry. Your knowledge graph is a map of how you think. That data never leaves your device.

Architecture in 30 seconds

AI Tool (Claude Code, Cursor, etc.)
  ↓ MCP (16 tools)
orion-api (FastAPI)
  ├── Redis     → hot cache, per-region TTLs (1h–7d)
  ├── ChromaDB  → semantic vectors, 7 collections per galaxy
  └── Postgres  → structural spine, knowledge graph, agent identities

Search uses Reciprocal Rank Fusion to blend three signals — keyword cache hits, vector similarity, and graph traversal — into a single ranked list. No single signal is sufficient; the fusion is what makes retrieval work.

Early results

We've been dogfooding Orion for the past three months. Some numbers:

  • ~200ms end-to-end latency for brain.think (entity extraction + graph linking + embedding + write)
  • sub-1ms cache hits on brain.recall for recent knowledge
  • 28% improvement in retrieval precision with cognitive region typing vs. untyped
  • 34% improvement in retrieval relevance after adding recency decay to RRF

The calibration loop (brain.calibrate at session end) produces measurable improvement over time. Agents that calibrate consistently surface better context in subsequent sessions — the confidence scores converge toward genuinely useful knowledge.

Get started

git clone https://github.com/aw537/orion && cd orion
cp .env.example .env
docker compose up

Connect your AI tool to http://localhost:8000/mcp and call brain.orient. That's it.

Orion is source-available under the Functional Source License. Read the docs, explore the architecture, or dive into the MCP tools reference.