How Reciprocal Rank Fusion makes agent memory retrieval work

The hardest problem in building a memory system for AI agents isn't storage — it's retrieval. The query is natural language, the knowledge is heterogeneous, and the right answer depends on context that changes every session.

We tried vector search alone. We tried keyword search alone. We tried graph traversal alone. None of them worked well enough. The solution was to stop choosing and blend all three.

Why single-signal retrieval fails

Vector similarity finds semantically related content but misses exact matches. Ask for "the kubectl deploy command" and you'll get paragraphs about deployment instead of the literal command.

Keyword search finds exact matches but misses semantic relationships. Ask for "database architecture decisions" and you'll miss the record that says "We chose Postgres for better JSON support" because it doesn't contain the word "architecture."

Graph traversal finds connected concepts but misses everything that isn't explicitly linked. It's powerful for "what relates to Redis?" but useless for "how do I deploy?"

Agent memory needs all three, weighted and fused.

Reciprocal Rank Fusion

RRF is a rank aggregation method that combines multiple ranked lists without requiring score normalization. For each document across N ranked lists:

score(d) = Σ 1 / (k + rank_i(d) + 1)

Where k is a smoothing constant (we use 60, following the original Cormack et al. paper). The key insight: RRF operates on rank positions, not raw scores. This means you can fuse rankings from completely different scoring systems — cosine similarity, BM25, graph degree — without normalizing them into a common scale.

Our three ranking signals

Signal 1: Redis keyword cache. Substring match against recently written and accessed stardust. Weighted by recency — a record written yesterday ranks higher than one from last month. This signal is fast (sub-millisecond) and catches exact matches that vector search misses.

Signal 2: ChromaDB semantic vectors. Cosine similarity against embedded stardust content, partitioned by cognitive region. We query multiple region collections in parallel and merge results. This signal catches semantic relationships but is slower (~50ms per region).

Signal 3: PostgreSQL graph hops. Extract entities from the query, look them up in the knowledge graph, traverse one hop outward, and return connected stardust. This signal surfaces knowledge that's conceptually related but lexically and semantically distant.

The recency discovery

We expected recency to be a minor factor. It turned out to be the single most important signal for agent memory retrieval.

The reason is straightforward: agents overwhelmingly need recent context. The decision from yesterday matters more than the one from three months ago. The current architecture matters more than the one you migrated away from.

We added a recency decay multiplier to the cache signal: weight halves every 7 days. This single change improved retrieval relevance by 34% in our benchmarks — more than any other tuning we did.

Per-region weight tuning

Different cognitive regions have different retrieval profiles:

Region	Primary signal	Why
`procedural`	Keyword	You want the exact command, not a paraphrase
`analytical`	Semantic	You want related decisions, not keyword matches
`contextual`	Recency	Recent context trumps old context
`strategic`	Semantic	Goals are expressed in varied language

We tune RRF's k parameter and signal weights per region. Procedural queries get a lower k (sharper ranking, favoring top keyword hits). Analytical queries get a higher k (smoother blending across signals).

Results

On our internal benchmark (500 query-answer pairs across 3 galaxies):

Configuration	Precision@5
Vector only	0.52
Keyword only	0.41
Graph only	0.29
RRF (uniform weights)	0.68
RRF (per-region weights)	0.73
RRF + recency decay	0.78

The gap between single-signal and fused retrieval is substantial. And per-region tuning adds another meaningful increment on top.

Implementation notes

The entire search pipeline runs in a single async function. Redis and ChromaDB queries execute in parallel; graph traversal runs concurrently. Total latency for a typical search: 80–150ms, dominated by the ChromaDB embedding lookup.

RRF fusion itself is trivial — a dictionary accumulating 1/(k + rank + 1) per document. The engineering effort is in building the three ranking signals and tuning their interaction.

Explore the search architecture or try it yourself: orion memory search "your query".