Skip to content

Proactive Surfacing

New here? Start with the STM Overview to see the full pipeline in context first.

Traditional RAG only provides relevant information when the agent explicitly requests a search. memtomem-stm’s proactive surfacing observes proxied MCP calls, figures out what the agent is working on, and automatically injects matching memories from LTM into the response — no explicit query needed. In v0.1.24, mms hook extends this surfacing path to supported Claude Code native-tool PostToolUse events as additionalContext.

When an agent calls an MCP tool, the STM proxy runs this pipeline:

Tool call → Context extraction → LTM search → Relevance gating → Inject into response

No agent code changes needed — routing through the STM proxy enables automatic memory injection for MCP communication. For Claude Code built-in tools, install mms hook as a host hook; it uses a warm local daemon by default so repeated hook calls do not pay LTM cold-start cost.

STM needs a search query before it can ask LTM for memories. Instead of relying on a single signal, it runs a five-pass pipeline — each pass tries a different source, and the first one that produces a usable query wins. That way a tool call with a clean _context_query argument is used directly, while a bare call like fs__read_file(path=...) still produces something searchable.

PriorityMethodDescription
1Tool-specific query templatePre-defined query patterns mapped to tool names
2_context_query argumentExplicit search query passed by the agent
3Path argumentsDedicated tokenization for path / file / filepath / file_path / filename keys (split on separators, drop extensions)
4Semantic keysKeyword combination from query / search / url / description and similar argument values
5Tool nameLast resort — use the tool name itself as the query

Surfaced memories are filtered through 5 stages to ensure usefulness:

  1. Context extraction — Can a meaningful query be generated from the tool call?
  2. Relevance assessment — Is the extracted query suitable for memory search?
  3. LTM search — Hybrid search for candidate memories
  4. Score filtering — Remove results below the min_score threshold
  5. Deduplication — In-session + cross-session (7-day) duplicate prevention

How surfaced memories are stitched into the response is controlled by MEMTOMEM_STM_SURFACING__INJECTION_MODE:

ModeBehavior
append (default)Memories appended below the response. Preserves progressive-delivery offsets and works on eligible continuation paths.
prependMemories prepended as a header. Skipped on progressive delivery because it would shift stm_proxy_read_more offsets.
sectionMemories placed in a dedicated section. Triggers surfacing on progressive continuations (F6).

Automatically scales based on the agent’s context window size:

Context windowCompressionInjection sizeResult count
≤ 32KHigh compressionSmallFew
32K – 200KDefaultMediumDefault
> 200KLow compressionLargeMany

When an agent evaluates surfacing quality, the auto-tuner continuously optimizes per-tool relevance thresholds:

  • helpful → Maintain or lower min_score for that tool
  • partially_helpful → Count as neutral evidence
  • not_relevant → Raise min_score (stricter filtering)
  • already_known → Count as negative feedback and feed local demotion / dedup behavior
  • Circuit breaker (3-state: closed / open / half-open) — Opens after circuit_max_failures consecutive failures (default 3) and transitions to half-open after circuit_reset_seconds (default 60s)
  • Surfacing timeout3s hard ceiling per call
  • Rate limit15 calls / minute ceiling across all tools
  • Write-tool skip — Disables surfacing for tools with side effects (file writes, deletes)
  • Query cooldown — Skips surfacing when the extracted query has Jaccard similarity > 0.95 with one seen in the last 5 seconds
  • Cross-session dedup — Default TTL 604800s (7 days) via MEMTOMEM_STM_SURFACING__DEDUP_TTL_SECONDS
  • Injection size cap — Default 3000 chars per injection
  • Local feedback demotion — Memories repeatedly rated not_relevant or already_known are filtered before injection once they cross feedback_demotion_negative_threshold (default 3 distinct events)
  • Query-text privacyquery_retention_days clears persisted raw query text after 30 days by default, and persist_query_text=false stores a sha256: digest instead of the raw query

STM talks to LTM over MCP. The default transport spawns memtomem-server over stdio, and v0.1.24 also supports long-running LTM services over sse or streamable_http:

Terminal window
export MEMTOMEM_STM_SURFACING__LTM_MCP_TRANSPORT=streamable_http
export MEMTOMEM_STM_SURFACING__LTM_MCP_URL=https://ltm.example/mcp
export MEMTOMEM_STM_SURFACING__LTM_MCP_HEADERS='{"Authorization":"Bearer ..."}'

LTM responses are consumed by the surfacing engine and bypass the proxy compression/cache pipeline.

A trace_id is threaded through the surfacing and progressive-delivery path so follow-up reads correlate with the initial chunk in Langfuse (or any OpenTelemetry-style tracer).