← Back to blog
Engineering

Zero-LLM entity extraction: building a knowledge graph on a laptop

AW
Andy Wu
2026-04-08 · 9 min

Every time an agent calls brain.think, Orion runs an 8-step knowledge integration pipeline: write the record, embed it, extract entities, extract relationships, process supersession chains, update backlinks, update expertise profiles, and log the event. The entire pipeline completes in ~200ms on a laptop. No GPU. No external API calls.

The key constraint that shaped this design: Orion must run on any machine without a GPU. That rules out transformer-based NER models, which are the standard approach for entity extraction. We needed something that's fast, deterministic, and good enough.

The two-pass extraction strategy

Pass 1: Regex pattern matching. A set of ~40 patterns that catch code symbols (backtick-wrapped text, CamelCase identifiers, dot-notation paths), URLs, file paths, technology names from a curated dictionary, and common natural language patterns like "switched from X to Y" or "A depends on B."

This pass is deterministic, runs in ~2ms, and catches approximately 60% of entities in typical engineering knowledge.

Pass 2: Optional LLM extraction. A single prompt to the configured LLM (Ollama's Llama 3.2 3B by default) asking it to identify entities and their types from the content. This adds ~180ms but pushes coverage to ~90%.

The LLM pass is optional. If no LLM is configured, pattern matching alone provides a usable graph. This matters because it means Orion's core value proposition — structured, graph-linked knowledge — works even in environments where running a local LLM isn't practical.

Relationship extraction

Relationships are extracted from the same content using pattern matching. The patterns detect directional relationships from natural language:

Pattern Extracted relationship
"switched from X to Y" Y REPLACES X
"X uses Y" / "X with Y" X USES Y
"X depends on Y" / "X requires Y" X DEPENDS_ON Y
"X works with Y" / "X alongside Y" X WORKS_WITH Y
"X is part of Y" / "X within Y" X PART_OF Y

Each extracted relationship creates a typed, weighted edge in the knowledge graph. Edge weights increment on re-observation — if multiple stardust records mention that "FastAPI uses Pydantic," the USES edge between them gets stronger.

Graph linking

Extracted entities are matched against existing entities using fuzzy string matching (Levenshtein distance < 3). This handles common variations: "Postgres" matches "PostgreSQL," "k8s" matches "Kubernetes."

New entities are created automatically with tier 1 (mentioned). As mention count increases, entities promote to tier 2 (referenced, 5+ mentions) and tier 3 (core, 15+ mentions). Tier affects retrieval ranking — core entities surface more prominently in brain.recall results.

Performance budget

On an M2 MacBook Air with Ollama running Llama 3.2 3B:

Step Latency % of total
Postgres write 3ms 1.5%
ChromaDB embed + upsert 45ms 22.5%
Regex entity extraction 2ms 1%
LLM entity extraction 130ms 65%
Relationship extraction 1ms 0.5%
Graph linking + backlinks 12ms 6%
Expertise profile update 3ms 1.5%
Nebula event log 2ms 1%
Total ~200ms

The LLM call dominates. Without it, the pipeline completes in ~70ms. We made the LLM pass optional specifically because the 130ms cost isn't always worth the incremental coverage — especially for short, structured content where regex patterns already catch most entities.

Transitive inference

The weekly audit extends the graph through transitive inference:

A USES B + B DEPENDS_ON C → A INDIRECTLY_DEPENDS_ON C
A USES B + B USES C       → A INDIRECTLY_USES C

Inferred edges are marked with inferred=True and carry lower confidence than observed edges. They're useful for discovery ("what does this project indirectly depend on?") but don't pollute the primary graph.

Why this matters

A knowledge graph that builds itself from natural language — without requiring the user to manually tag, link, or categorize anything — is what makes Orion's retrieval work. The graph is the third signal in our RRF search pipeline: when you query for "database architecture," Orion finds the entity "database," traverses one hop outward, and includes connected stardust in the ranked results.

After a few weeks of use, the graph becomes a navigable map of every concept, tool, and decision in your project. And it cost zero API calls to build.

Explore the knowledge graph docs or try orion graph query <entity> to see your graph.