Overview
What is memtomem-stm?
Section titled “What is memtomem-stm?”memtomem-stm is a short-term memory (STM) proxy that sits between your AI agent and your existing MCP servers. Without any agent-side code changes, it adds response compression and proactive memory injection to every tool call — typically cutting token use by 20–80%.
The current PyPI release is memtomem-stm v0.1.24. It adds mms hook for Claude Code native-tool surfacing, a warm local daemon for hook calls, network LTM MCP transport, query-aware compression, relevance buckets, richer feedback, and query-text privacy controls.
Use It When
Section titled “Use It When”- MCP tool responses keep blowing your context window — filesystem or GitHub MCP servers often return 8,000-token payloads. STM compresses them to ~2,000 with a strategy picked for the content type.
- You want memories auto-injected without the agent having to ask — with LTM alone, the agent has to call
mem_search. With STM in front, relevant memories ride along with every tool response, no explicit query needed. - You want Claude Code native-tool context to see memories too —
mms hookcan addadditionalContextfor read-like PostToolUse events, using a daemon-warmed LTM connection by default.
Start in 3 Steps
Section titled “Start in 3 Steps”uv tool install memtomem-stm # 1. installmms init --mcp claude # 2. register upstream + Claude Code (one step)mms health # 3. verify connectivitymms init prompts for an upstream server and then registers memtomem-stm with your MCP client of choice (--mcp claude, --mcp json, or --mcp skip). Full setup walkthrough in Quick Start.
Core Capabilities
Section titled “Core Capabilities”- Proactive Surfacing — Every tool call runs candidate memories through 5 relevance checks (context extraction → query suitability → LTM search → score threshold → dedup window) before anything is injected. See Proactive Surfacing.
- Response Compression — 10 strategies pick themselves based on content type (JSON, Markdown, API docs, free text, …), with query-aware ranking and safer JSON output tiers. See Compression Strategies.
- Hook + Daemon Path —
mms hookbridges supported host PostToolUse payloads into STM surfacing, whilemms daemonkeeps the LTM session warm so hook calls avoid repeated cold starts. - Optional INDEX hooks — Stage 4 config exists for library integrations, but the standalone
mmsserver does not wire an index engine yet;auto_indexand extraction are inert there until the MCP-only adapter lands.
How It Works
Section titled “How It Works”AI Agent ↕ MCP protocolmemtomem-stm (STM Proxy) ├── ↕ Surfacing queries → memtomem (LTM) └── ↕ Proxied calls → Upstream MCP Servers (filesystem, GitHub, …)STM runs every MCP tool call through a 4-stage pipeline:
- CLEAN — normalize the request (strip noise, unify format)
- COMPRESS — shrink the response (auto-select from 10 strategies)
- SURFACE — pull relevant memories from LTM and inject them (5-level gating)
- INDEX — optional write-back hook for future LTM accumulation; inert in the standalone
mmsserver today
Relationship to LTM
Section titled “Relationship to LTM”STM and LTM are independent packages — no Python dependency between them. They communicate only via MCP protocol, and each can be deployed and upgraded separately.
| LTM (memtomem) | STM (memtomem-stm) | |
|---|---|---|
| Role | Persistent storage & search | Real-time proxy & compression |
| Required? | Yes (core) | Optional |
| Communication | Direct MCP server | MCP proxy → queries LTM |
Package Info
Section titled “Package Info”| PyPI | memtomem-stm |
| Latest release | 0.1.24 |
| CLI | mms |
| License | Apache 2.0 |
| GitHub | memtomem/memtomem-stm |
Next Steps
Section titled “Next Steps”- Quick Start — from install to agent connection
- Proactive Surfacing — 5-level gating and feedback auto-tuning
- Compression Strategies — 10 strategies and auto-selection logic
- Context Gateway — Cross-runtime sync (LTM feature)
- MCP Tools — STM management tools
- CLI Reference —
mmscommand reference