memory_consolidate

After enough memories accumulate on a single topic, search starts returning fragments. Twenty memories about "auth" means twenty individual results, each capturing one decision, one bug fix, one configuration detail. A future session searching "auth" has to read all twenty, piece together the current state, and hope nothing contradicts. Consolidation solves this by compressing those fragments into one executive summary that future sessions read first.

The Fragment Problem

Memory systems accumulate knowledge incrementally. Each session adds a few memories: a decision here, a preference there, a resolved bug, a configuration path. This is correct behavior — granular memories are searchable and specific. But topics grow. After weeks of work on a project, a single topic might have 30 or 40 individual memories attached to it.

When a new session searches that topic, it gets back a ranked list of fragments. The AI reads previews, retrieves a handful in full, and tries to reconstruct a coherent understanding from pieces. It might miss a critical decision buried at rank #18. It might spend tokens reading ten memories that all say variations of the same thing. Worse, it might draw conclusions from the three memories it happened to read without knowing about the seven that would change the picture.

This is the problem consolidation exists to solve: turning a corpus of fragments into a single, authoritative briefing document.

What a Summary Contains

Summary Structure (300–500 tokens)

Every executive summary covers the same ground in present tense, written to stand alone with no external context:

What the topic is and why it exists — the one-paragraph orientation that a new session needs before anything else. Not the history of how it came to be, but what it is right now.
Current state and key decisions made — what's deployed, what's been chosen, what architecture is in place. The decisions that would be expensive to revisit.
Major solved problems — problems that consumed significant effort and whose solutions are non-obvious. Without this, future sessions risk re-investigating solved issues.
Open questions — what remains unresolved, what's been deferred, what's known to be incomplete. The honest gaps.

Only one summary exists per topic at any time. Calling memory_consolidate again on the same topic replaces the existing summary. The old version is archived, not deleted.

How Summaries Surface in Search

Summaries receive a dual boost in the search pipeline to ensure they appear near the top of results when their topic is relevant.

Dual-Stage Search Boost

During the Reciprocal Rank Fusion stage, summaries receive a 1.4x RRF multiplier. This is the highest type boost in the system (preferences receive 1.3x, all other types receive none). The RRF boost ensures summaries survive the candidate pool cap and enter the reranking stage even when competing against dozens of individual memories on the same topic.

After cross-encoder reranking, summaries receive a second boost: a 1.3x post-rerank multiplier, capped at a score of 1.0. This pushes them toward the top of the final result list.

The combined effect: when you search a topic that has an executive summary, the summary appears in the top few results. The AI reads the summary first, gets the full picture in one pass, then selectively retrieves individual memories only when it needs specific details that the summary references but doesn't fully expand.

Staleness Detection

Every summary carries a metadata footer recording how many memories existed on the topic at the time of consolidation and the date it was written. When a summary appears in search results, the system compares the recorded count to the current live count. If the topic has grown by 20% or more since the summary was written, a staleness warning is appended to the summary text, visible to the AI in the search results.

The warning tells the AI: this summary was written when the topic had N memories, and it now has N+M. Consider updating it with memory_consolidate. The AI decides whether to act on this — sometimes the new memories are minor additions that don't change the summary, and sometimes they represent a significant shift that makes the summary incomplete.

Staleness detection runs only when a summary actually appears in search results (at most 1–2 per search), so the live count query adds negligible latency to the search path.

Requirements Before Calling

Consolidation is a high-stakes write operation. A summary replaces dozens of individual memories as the entry point to a topic. The requirements exist to prevent summaries built from incomplete knowledge.

Deep search first

Run memory_deep_search on the topic before consolidating. Standard search returns up to 32 results; deep search returns up to 100. The goal is comprehensive coverage, not a quick sample.

Read 20+ memories

Retrieve and read at least 20 full memories, not just their previews. Previews truncate at roughly 50 characters. A summary built from previews is a summary built from first sentences.

Extend, don't replace

If a summary already exists for this topic, retrieve it first. Extend it with new information rather than starting from scratch. The existing summary may contain context from sessions you don't have access to.

Reflect the full corpus

The summary must reflect everything stored on the topic, not just what came up in the current conversation. A session that discussed three aspects of "auth" should not write a summary that covers only those three aspects when the corpus contains twelve.

Guard Rails

Two server-side checks prevent low-quality summaries from entering the system.

Minimum Memory Count

The server counts all non-summary memories tagged with the requested topic (in either topics or macro_topics). If the count is below 10, the request is rejected with an error specifying the current count. There is nothing to summarize when a topic has 7 memories — those 7 memories are the summary. The threshold exists because consolidation is most valuable (and most dangerous) when there are enough memories that no single session can comfortably read them all.

Minimum Token Count

The submitted summary content must be at least 200 tokens (estimated as character count divided by 4). A 150-token summary is too thin to cover the required structure: what the topic is, current state, key decisions, solved problems, and open questions. A summary that short is almost certainly missing critical sections. The 200-token floor forces a minimum level of substance. The recommended range is 300–500 tokens.

Archival and Restoration

When a new summary replaces an existing one, the old summary is not deleted. It is reclassified to type summary_archived and its topic tags are cleared. This means archived summaries do not appear in normal topic searches — they won't compete with the current summary or pollute results with outdated information. But they remain in the database and can be found with a targeted search.

Restoring a Previous Summary

To restore an archived summary: search with type_filter="summary_archived", retrieve the old content, then call memory_consolidate with that content. This creates a new summary containing the old text (which archives the current summary in turn). The process is symmetrical — restoration is just another consolidation call.

This matters when a consolidation goes wrong. If a summary was written from incomplete knowledge and introduces inaccuracies, the previous version can be restored while the problem is investigated. Consolidation replaces, never appends, so there is always exactly one active summary per topic and a chain of archived versions behind it.

The Poisoning Risk

Why the Requirements Are Non-Negotiable

A bad summary poisons every future session's understanding of that topic. Summaries appear near the top of search results. If a summary contains incorrect information, every AI that searches that topic will start from a wrong premise. It will make decisions based on that wrong premise. It will store new memories that build on that wrong premise. The error compounds.

The requirements — deep search first, 20+ memories read, extend existing summaries, reflect the full corpus — exist to prevent consolidation from partial knowledge. A session that has seen 8 of 40 memories on a topic does not have enough context to write a summary that the other 32 memories would agree with. The guard rails (10+ memories required, 200+ tokens minimum) are server-enforced backstops, but the real protection is in the protocol: the tool description instructs the AI to do the research before it writes.

This is a deliberate trade-off. Consolidation is harder to invoke than a simple memory_add. The friction is the feature. Low-effort summaries are worse than no summary at all, because no summary means the AI reads individual memories and forms its own understanding. A wrong summary means the AI reads the summary, trusts it, and never looks at the individual memories that would correct it.

When to Consolidate

Consolidation is most useful when a topic has crossed the threshold where reading individual memories is no longer practical. Some signals:

A topic has accumulated 15+ memories across multiple sessions, and new sessions spend time re-reading old context.
The AI's memory_deep_search on the topic returns results that overlap significantly — multiple memories saying similar things from different sessions.
The staleness warning appears on an existing summary, indicating that the topic has grown 20% or more since the last consolidation.
A major architectural decision or project milestone has been reached, and the pre-milestone memories are now historical context rather than active concerns.

Consolidation is not mandatory. Topics with fewer than 10 memories don't need it and the server won't allow it. Topics that are small enough to scan quickly in search results are fine without a summary. The tool exists for the topics that have grown past that point.

← Back to Tools overview