gptme-retrieval
Automatic context retrieval plugin for gptme.
This plugin adds a STEP_PRE hook that automatically retrieves relevant context before each LLM step, using backends like qmd for semantic and keyword search. A per-conversation deduplication layer ensures each document is injected at most once, so there's no context bloat in multi-step turns.
Installation
# Install the plugin
pip install -e plugins/gptme-retrieval
# Make sure qmd is installed (for default backend)
cargo install qmd
Configuration
Configure in your gptme.toml:
[plugin.retrieval]
enabled = true # Enable/disable retrieval (default: true)
backend = "qmd" # Backend: "qmd", "gptme-rag", "grep", or custom command
mode = "vsearch" # qmd mode: "search" (BM25), "vsearch" (semantic), "query" (hybrid)
max_results = 5 # Maximum results to inject (default: 5)
threshold = 0.3 # Minimum score threshold (default: 0.3)
collections = [] # Optional: filter by collection names
inject_as = "system" # "system" for visible, "hidden" for background context
How It Works
- STEP_PRE Hook: Before each LLM step, the plugin extracts the last user message
- Retrieval: Queries the configured backend with the user's message text
- Deduplication: Checks each result against a per-conversation set of already-injected documents (keyed by source path + content hash)
- Injection: Adds only new documents as a system message; skips the step silently if nothing new was retrieved
This approach is correct for both interactive and autonomous sessions:
- Interactive (multi-step): Fires on every tool-call step, but deduplication prevents the same doc being injected 5× in a single turn
- Autonomous (single-step): Behaves identically to TURN_PRE since there's only one step per turn
- Topic changes: When a new user message triggers a different retrieval result, new documents are injected immediately
Backends
qmd (default)
Uses qmd for retrieval. Supports three modes:
search: BM25 keyword searchvsearch: Semantic/vector searchquery: Hybrid search combining both
gptme-rag
Uses gptme-rag for retrieval. Experimental but integrates well with gptme:
[plugin.retrieval]
backend = "gptme-rag"
Install with: pipx install gptme-rag
Note: gptme-rag is currently experimental and may have issues.
grep
Simple grep-based fallback for basic keyword matching.
Custom
Any command that accepts a query and outputs JSON can be used:
[plugin.retrieval]
backend = "my-search --json"
Indexing Conversations
To index gptme conversations with qmd:
# Extract user/assistant messages (skip system)
jq 'select(.role != "system") | {role, content}' ~/.local/share/gptme/logs/*/conversation.jsonl > filtered.jsonl
# Index with qmd
qmd index filtered.jsonl --collection conversations