Skip to content

Hybrid Search

memtomem’s hybrid search combines keyword search and semantic search, leveraging both exact term matching and meaning-based similarity in a single query.

Hybrid search runs three search engines in parallel:

EngineBased onStrength
BM25SQLite FTS5Exact keyword/term matching. Strong for unique identifiers like “FastAPI”, “mem_search”
Vector searchsqlite-vec + ONNX/Ollama/OpenAI embeddingsSemantic similarity. Can match “how to deploy” → “deployment checklist”
RRF fusionReciprocal Rank FusionCombines rankings from both engines into a final score

During indexing, documents are split into meaningful units — not by token count, but by structure. Six chunking strategies:

StrategyTargetBehavior
Markdown.md filesSplit by heading level, preserving hierarchy
Python AST.py filesSplit by function/class, including docstrings
JS/TS AST.js, .ts filestree-sitter based function/module splitting
JSON.json filesStructure-aware splitting
YAML/TOML.yaml, .tomlKey-value block splitting
Plain textOther filesParagraph/newline based splitting

Instead of full re-indexing, only changed chunks are updated:

  1. Store SHA-256 hash for each chunk
  2. On re-index, compare hashes to detect changes
  3. Only re-embed changed chunks

This minimizes indexing cost even for large document sets.

Organize memories into scoped groups:

  • Namespace auto-derived from folder names
  • Filter by namespace when searching
  • Support agent-level isolation and sharing in multi-agent environments
  • Near-duplicate detection — automatically identify memories with nearly identical content
  • Time-based decay — gradually decrease search weight for older memories
  • TTL expiration — automatically delete memories past their configured lifespan
  • Auto-tagging — automatically assign tags based on content analysis