Search Pipeline — Gnosis Memory

How Search Works

Every search query passes through a multi-stage retrieval pipeline before results reach your AI. This page walks through each stage, the constants that govern it, and why the pipeline is shaped the way it is.

Two Search Modes

Gnosis exposes two search tools, tuned for different jobs.

memory_search

Up to 32 results. A 2x-oversampled candidate pool of 64, scored by the cross-encoder in a single pass. Designed for conversational use: the AI searches, scans previews, retrieves 2–3 in full. Sub-200ms in production. When the result set is small enough to skip reranking, responses return in under 40ms.

Use for targeted lookups: "what did we decide about auth?", "find my postgres connection details", "how does the caching layer work?"

memory_deep_search

Up to 100 results. A candidate pool of up to 256, scored by the cross-encoder reranker in a single pass. Requires 2+ keywords to prevent context overflow on broad terms. Typically under 350ms, including full cross-encoder reranking across the entire candidate pool.

Use for comprehensive coverage: "everything about the auth refactor", "all decisions related to deployment architecture". When the goal is exhaustive recall, not a quick answer.

The Pipeline

Both search modes run the same pipeline. Deep search feeds it more candidates and gets more results back. The stages, in execution order:

Stage 1: Query Embedding

The search query is converted into a dense vector representation — a point in the same high-dimensional mathematical space where all stored memories live. This embedding captures meaning, not surface text. "Database performance tuning" and "optimizing slow queries" map to nearby regions even though they share no words.

At the same time, the query is tokenized into individual keywords for the parallel topic lookup. Both operations happen before any database work begins.

Stage 2: Parallel Candidate Fetch

Two retrieval paths run against the database:

Vector similarity search — finds the nearest memories by embedding distance. For standard search, this fetches up to 64 candidates. For deep search, 256. These are the semantic matches: memories whose meaning is closest to your query, regardless of the exact words used.
Topic hash lookup — finds memories whose topic tags match the extracted keywords. Fetches 75% of the vector pool size (48 for standard, 192 for deep). Topic matching is a binary hash check — fast and exact. It catches results that vector search might rank lower because they use different vocabulary but are literally tagged with the query term.

Both paths are sub-3ms with warmed prepared statements. The database is cheap; the reranker is the bottleneck. Fetching generously here costs nearly nothing and prevents good candidates from being cut before they reach the scoring stages.

Stage 3: Collection Fan-Out

Search automatically spans your personal memory namespace and every shared collection you belong to. No configuration, no extra parameters. A single query fans out across all scopes simultaneously.

When memories are published to shared collections, the same content exists in multiple namespaces. The database can't deduplicate these — it returns rows by distance, and identical embeddings at the same distance consume result slots, displacing unique candidates. Gnosis compensates by over-fetching (up to 2x when collections are present), then deduplicating by content hash and truncating back to the original target pool size. The result: collection users see the same retrieval quality as single-namespace users.

Stage 4: Reciprocal Rank Fusion

The vector results and topic results are two separate ranked lists. Reciprocal Rank Fusion (RRF) combines them into a single ranking without either source dominating.

The formula is simple: for each document, its RRF score is the sum of 1 / (rank + k + 1) across every list where it appears. Documents found by both vector search and topic matching get contributions from both lists. The constant k = 60 controls rank sensitivity — higher values flatten the curve, giving lower-ranked results more weight relative to top-ranked ones.

At the RRF stage, two type boosts are applied before the pool is sorted:

Preferences receive a 1.3x RRF multiplier. Preferences make up roughly 5% of a typical corpus and are easily drowned out by facts. The boost ensures they survive the pool cap when relevant.
Summaries receive a 1.4x RRF multiplier. Executive summaries are high-value, low-frequency documents that compress dozens of individual memories into one. They should surface near the top when the topic matches.

Why RRF matters

Vector similarity and keyword matching solve different problems. Neither alone is sufficient for a memory system.

Vector search excels at meaning. It finds "query optimization notes" when you search "database performance" because the concepts occupy nearby regions in embedding space. But it struggles with proper nouns, project names, function names, and identifiers — words that carry high search intent but low semantic weight. Searching for "postgres" might rank a memory about relational database theory above a memory literally titled "Postgres connection string" because the theory text has more semantic overlap with the query.

Topic matching excels at exact terms. If a memory is tagged "postgres", a keyword search for "postgres" will always find it. But topic matching has no concept of meaning — it can't connect "database performance" to "query optimization" because the words don't match.

RRF combines both signals without calibration. The two retrieval paths use different score scales (cosine similarity vs binary match), and RRF doesn't care — it operates on ranks, not scores. A document ranked #1 by both paths gets a strong combined score. A document ranked #1 by one path and absent from the other still scores well. The fusion is additive and non-destructive.

Stage 5: Cross-Encoder Reranking

After RRF fusion, the combined candidate pool is scored by a dedicated cross-encoder reranker. This is the most computationally expensive stage — and the most important.

Unlike the embedding stage (which independently encodes query and memory, then compares vectors), the cross-encoder reads the query and each candidate together as a single input. It sees the full text of both, applies attention across the pair, and outputs a direct relevance score. This catches nuances that vector similarity misses: negation, conditional relevance, partial matches, and context-dependent meaning.

The entire candidate pool is scored in a single pass. Standard search sends up to 64 candidates through the reranker. Deep search sends up to 256 candidates, producing up to 100 scored results. The reranker reads the query paired with each candidate's full text and outputs a direct relevance score per pair.

Stage 6: Post-Rerank Adjustments

After the reranker assigns raw relevance scores, a series of targeted adjustments run in sequence:

Preference boost — preferences receive a 1.25x post-rerank multiplier (capped at 1.0). Combined with the 1.3x RRF boost, preferences get a two-stage lift that's enough to compete with facts without overwhelming them.
Summary boost — summaries receive a 1.3x post-rerank multiplier (capped at 1.0). Summaries compress 20+ individual memories into one, so their per-token value is high. The boost reflects that.
Result demotion — memories of type result (operational findings linked to tasks) receive a 0.6x multiplier. These are useful in the context of their parent task but are low-value as standalone search results.
Done penalty — completed items (types ending in _done) receive a 0.7x multiplier. Completed tasks and superseded decisions should sink below active ones. A done preference gets a net multiplier of 1.25 × 0.7 = 0.875 — still visible, but below its active equivalent.

After each adjustment group, results are re-sorted by score. The adjustments are multiplicative and composable — they stack predictably without interaction effects.

The Keyword Floor

Cross-encoder rerankers are trained on semantic similarity. They understand meaning well but consistently underscore exact-match queries — names, identifiers, project titles, and technical terms that carry high search intent but low semantic weight.

A concrete example: searching "harper" to find a memory about a person named Harper. The reranker scores the memory at 0.008 because "harper" carries almost no semantic signal — it's just a proper noun. Without intervention, this memory falls below the score floor and gets cut entirely.

The keyword floor fixes this. When query terms appear literally in a candidate's text (with word-boundary matching to prevent false positives like "how" matching "show"), the candidate's score is raised to a floor of 0.30, scaled by the fraction of query terms that matched. A single-term match on a two-term query gets a floor of 0.15. Both terms matching gets the full 0.30.

This is not a boost — it's a floor. Candidates already scoring above 0.30 are unaffected. The floor only rescues literal keyword matches that the reranker would otherwise bury. It exists because of a fundamental tension: embeddings and rerankers optimize for semantic similarity, but users often search by name. The floor bridges that gap without undermining the reranker's semantic judgments on everything else.

Smart Skip: When Reranking Is Unnecessary

When standard search returns fewer than 5 candidates, the reranker is skipped entirely. There's no value in running a pairwise relevance model on 3 results when all of them can be returned.

Instead, synthetic scores are computed from the RRF fusion scores, normalized against a theoretical maximum and modulated by cosine similarity for vector-sourced results. The scoring uses a sigmoid curve centered at 0.4, producing calibrated scores in the 0.40–0.70 range. These synthetic scores are capped at 0.70 to signal that no reranker verification occurred — a high synthetic score means "probably relevant" rather than "confirmed relevant."

Topic-only results (found by keyword hash but not by vector similarity) receive pure sigmoid scores without cosine modulation, since there is no cosine similarity to reference.

The skip saves GPU time for the obvious cases: narrow queries that match a handful of memories cleanly. Deep search never skips — if you asked for comprehensive results, you get reranked results.

Stage 7: Preview Generation

Final results are formatted as compressed previews. Each result includes an ID, a truncated content preview, relevance scores (both vector similarity and rerank score), the memory type, a timestamp, and topic tags.

A complete flag tells the AI whether the preview contains the full memory or was truncated. High-confidence results (rerank score above 0.80, content under 2,048 characters) are auto-expanded — the top 3 results for standard search and top 5 for deep search get their full text included in the preview, saving a memory_retrieve round trip.

The response format is a compact table: [results, counts, topics]. The counts string summarizes the pipeline ("Reranked X of Y, returned Z"). The topics array shows the distribution of topic tags across results, giving the AI a vocabulary for follow-up searches.

Score Interpretation

Rerank scores are on a 0–1 scale, displayed as 0–100 in results. The score reflects how well the candidate answers the specific query, not how "important" the memory is in general.

Score Ranges

30+ (0.30+) — excellent match. The memory directly addresses the query. Read it.
20–30 (0.20–0.30) — good match. Related content, likely useful. Worth retrieving if the preview looks relevant.
Below 20 (< 0.20) — weak match. Tangentially related or matched on a shared keyword. Usually safe to skip unless you're doing exhaustive research.

A score floor of 0.03 filters out noise before results are returned. Candidates below this threshold are genuinely irrelevant — the reranker assigned near-zero confidence. When Gnosis returns nothing, the information isn't stored. Empty results are a signal, not a failure.

Privacy in the Search Path

The search pipeline operates on mathematical representations, not on your text.

What the pipeline sees and doesn't see

Embeddings are lossy mathematical projections. They encode meaning well enough for similarity matching, but the original text cannot be reconstructed from an embedding vector. The projection is one-way — many different texts can produce similar embeddings, and the mapping is not reversible.

Topic tags are stored as hashes alongside the encrypted content. The database can match hashes without knowing what the original topic text was. This enables the keyword retrieval path without exposing the tag vocabulary in plaintext indexes.

Encrypted memory content is decrypted only at the final stage, after scoring and filtering are complete, using session-derived keys that exist only in memory during active sessions. The entire retrieval pipeline — vector search, topic matching, RRF fusion, reranking, and score adjustment — runs without ever seeing the plaintext content of your memories. Only the final results that survive all filtering stages are decrypted for delivery to your AI.

← Back to Tools overview