Hybrid Search
memtomem’s hybrid search combines keyword search and semantic search, leveraging both exact term matching and meaning-based similarity in a single query.
Search Architecture
Section titled “Search Architecture”Hybrid search runs three search engines in parallel:
| Engine | Based on | Strength |
|---|---|---|
| BM25 | SQLite FTS5 | Exact keyword/term matching. Strong for unique identifiers like “FastAPI”, “mem_search” |
| Vector search | sqlite-vec + ONNX/Ollama/OpenAI embeddings | Semantic similarity. Can match “how to deploy” → “deployment checklist” |
| RRF fusion | Reciprocal Rank Fusion | Combines rankings from both engines into a final score |
Semantic Chunking
Section titled “Semantic Chunking”During indexing, documents are split into meaningful units — not by token count, but by structure. Six chunking strategies:
| Strategy | Target | Behavior |
|---|---|---|
| Markdown | .md files | Split by heading level, preserving hierarchy |
| Python AST | .py files | Split by function/class, including docstrings |
| JS/TS AST | .js, .ts files | tree-sitter based function/module splitting |
| JSON | .json files | Structure-aware splitting |
| YAML/TOML | .yaml, .toml | Key-value block splitting |
| Plain text | Other files | Paragraph/newline based splitting |
Incremental Indexing
Section titled “Incremental Indexing”Instead of full re-indexing, only changed chunks are updated:
- Store SHA-256 hash for each chunk
- On re-index, compare hashes to detect changes
- Only re-embed changed chunks
This minimizes indexing cost even for large document sets.
Namespaces
Section titled “Namespaces”Organize memories into scoped groups:
- Namespace auto-derived from folder names
- Filter by namespace when searching
- Support agent-level isolation and sharing in multi-agent environments
Maintenance Features
Section titled “Maintenance Features”- Near-duplicate detection — automatically identify memories with nearly identical content
- Time-based decay — gradually decrease search weight for older memories
- TTL expiration — automatically delete memories past their configured lifespan
- Auto-tagging — automatically assign tags based on content analysis