Skip to content

Overview

memtomem-stm is a short-term memory (STM) proxy that sits between your AI agent and your existing MCP servers. Without any agent-side code changes, it adds response compression and proactive memory injection to every tool call — typically cutting token use by 20–80%.

The current PyPI release is memtomem-stm v0.1.24. It adds mms hook for Claude Code native-tool surfacing, a warm local daemon for hook calls, network LTM MCP transport, query-aware compression, relevance buckets, richer feedback, and query-text privacy controls.

  • MCP tool responses keep blowing your context window — filesystem or GitHub MCP servers often return 8,000-token payloads. STM compresses them to ~2,000 with a strategy picked for the content type.
  • You want memories auto-injected without the agent having to ask — with LTM alone, the agent has to call mem_search. With STM in front, relevant memories ride along with every tool response, no explicit query needed.
  • You want Claude Code native-tool context to see memories toomms hook can add additionalContext for read-like PostToolUse events, using a daemon-warmed LTM connection by default.
Terminal window
uv tool install memtomem-stm # 1. install
mms init --mcp claude # 2. register upstream + Claude Code (one step)
mms health # 3. verify connectivity

mms init prompts for an upstream server and then registers memtomem-stm with your MCP client of choice (--mcp claude, --mcp json, or --mcp skip). Full setup walkthrough in Quick Start.

  • Proactive Surfacing — Every tool call runs candidate memories through 5 relevance checks (context extraction → query suitability → LTM search → score threshold → dedup window) before anything is injected. See Proactive Surfacing.
  • Response Compression — 10 strategies pick themselves based on content type (JSON, Markdown, API docs, free text, …), with query-aware ranking and safer JSON output tiers. See Compression Strategies.
  • Hook + Daemon Pathmms hook bridges supported host PostToolUse payloads into STM surfacing, while mms daemon keeps the LTM session warm so hook calls avoid repeated cold starts.
  • Optional INDEX hooks — Stage 4 config exists for library integrations, but the standalone mms server does not wire an index engine yet; auto_index and extraction are inert there until the MCP-only adapter lands.
AI Agent
↕ MCP protocol
memtomem-stm (STM Proxy)
├── ↕ Surfacing queries → memtomem (LTM)
└── ↕ Proxied calls → Upstream MCP Servers
(filesystem, GitHub, …)

STM runs every MCP tool call through a 4-stage pipeline:

  1. CLEAN — normalize the request (strip noise, unify format)
  2. COMPRESS — shrink the response (auto-select from 10 strategies)
  3. SURFACE — pull relevant memories from LTM and inject them (5-level gating)
  4. INDEX — optional write-back hook for future LTM accumulation; inert in the standalone mms server today

STM and LTM are independent packages — no Python dependency between them. They communicate only via MCP protocol, and each can be deployed and upgraded separately.

LTM (memtomem)STM (memtomem-stm)
RolePersistent storage & searchReal-time proxy & compression
Required?Yes (core)Optional
CommunicationDirect MCP serverMCP proxy → queries LTM
PyPImemtomem-stm
Latest release0.1.24
CLImms
LicenseApache 2.0
GitHubmemtomem/memtomem-stm