Skip to content

Compression Strategies

memtomem-stm automatically compresses MCP tool responses by content type to save tokens. It provides 10 strategies that reduce response size while preserving the information the agent needs.

StrategyTarget contentBehavior
truncateSmall textLength-limited truncation (default fallback)
hybridMarkdownPreserve structure + abbreviate non-essential sections
selectiveGeneral textKeep only query-relevant portions
progressiveLarge contentCursor-based sequential delivery (zero information loss)
extract_fieldsJSON dictionariesExtract key fields only
schema_pruningJSON arraysPreserve schema + reduce samples
skeletonAPI docsPreserve structural skeleton only
llm_summaryComplex textLLM-based summarization (Ollama/OpenAI)
autoAll typesAnalyze content and auto-select optimal strategy
nonePass through original without compression

The auto strategy (default) analyzes content to pick the optimal strategy:

Content typeSelected strategy
JSON dictionaryextract_fields
Large JSON arrayschema_pruning
Markdown documenthybrid
API documentationskeleton
Small text (< threshold)truncate
Other large textselective

During compression, the agent’s current query is taken into account — relevant sections receive a larger token budget. For example, when compressing API documentation while the agent is asking about “authentication module,” auth-related endpoints get more space.

Zero Information Loss: Progressive Delivery

Section titled “Zero Information Loss: Progressive Delivery”

The progressive strategy delivers large content without any information loss:

  1. First response delivers a table of contents (TOC) and the first chunk
  2. Agent requests more → cursor-based delivery of subsequent chunks
  3. Full content can be inspected sequentially

When the compression ratio guardrail (default 65% preservation) is violated, a 3-tier fallback activates automatically:

progressive → hybrid → truncate

Each tier checks the guardrail — if satisfied, that strategy’s output is used.

Agent feedback automatically adjusts per-tool compression budgets:

  • Agent reports information loss → Increase preservation ratio for that tool
  • Agent reports response too long → Decrease preservation ratio