Consortium Plugin for gptme
Multi-model consensus decision-making for gptme.
Status: ✅ Phase 1 Complete - Core functionality implemented and tested
Overview
The consortium plugin orchestrates multiple LLMs to provide diverse perspectives and synthesize consensus responses. It queries multiple frontier models in parallel, then uses an arbiter model to analyze and synthesize a consensus answer with confidence scoring.
Key improvements (Phase 1):
- ✅ Real model integration via gptme.llm
- ✅ Robust JSON extraction (handles markdown code blocks, embedded JSON)
- ✅ Error handling with graceful fallbacks
- ✅ Comprehensive test coverage (14 unit tests + integration tests)
- ✅ Type-safe confidence scoring
Features
- Multi-model orchestration: Query multiple models in parallel
- Consensus synthesis: Arbiter model synthesizes best answer
- Confidence scoring: Quantifies agreement between models
- Flexible configuration: Choose models and arbiter
- Detailed output: See individual responses and synthesis reasoning
Installation
The plugin is automatically discovered when placed in a configured plugin path. Add to your gptme.toml (user or project level):
[plugins]
paths = ["path/to/plugins"]
enabled = ["gptme_consortium"] # Optional: limit which plugins load
Usage
Basic Query
query_consortium(
question="What's the best approach for handling rate limiting?"
)
With Custom Models
query_consortium(
question="Should we use microservices or monolith?",
models=[
"anthropic/claude-sonnet-4-5",
"openai/gpt-4o",
"openai/o1"
],
arbiter="anthropic/claude-opus-4"
)
With Confidence Threshold
query_consortium(
question="Critical architectural decision...",
confidence_threshold=0.9 # Require 90% confidence
)
Output Format
The tool returns:
- Consensus: Synthesized answer incorporating all perspectives
- Confidence: Score from 0-1 indicating model agreement
- Individual Responses: Each model's perspective
- Synthesis Reasoning: Why the arbiter chose this consensus
- Metadata: Models used, arbiter model
Use Cases
- Architectural Decisions: Get multiple expert perspectives
- Code Review: Multiple models review the same code
- Quality Checking: Validate important outputs
- Model Comparison: See how different models approach a problem
- High-Stakes Decisions: Require consensus before proceeding
Implementation Status
✅ Phase 1 Complete (Core Functionality)
- Real model integration via
gptme.llm.reply() - Robust JSON parsing from arbiter responses
- Error handling with fallback synthesis
- Comprehensive test suite (14 tests, 100% pass)
- Confidence type validation
🚧 Phase 2 Planned (Advanced Features)
- Iterative refinement (multi-round consensus)
- Response caching (avoid redundant queries)
- Parallel querying (faster execution)
- Voting mechanisms (for discrete choices)
🔮 Phase 3 Future (Production Polish)
- Detailed metadata tracking (tokens, costs)
- Custom arbiter strategies
- Performance optimization
- Cost tracking dashboard
Dependencies
- gptme >= 0.27.0
- Access to configured LLM providers (Anthropic, OpenAI, etc.)
- Valid API keys in environment or config
Testing
# Run all tests
uv run --with pytest --with pytest-mock pytest tests/test_consortium.py -v
# Run fast tests only (skip integration)
uv run --with pytest --with pytest-mock pytest tests/ -v -m "not slow"
# Run with coverage
uv run --with pytest --with pytest-mock --with pytest-cov pytest tests/ --cov=src/gptme_consortium
Configuration
Default models (used if not specified):
- anthropic/claude-sonnet-4-5 (Claude Sonnet 4.5, Sept 2025)
- openai/gpt-5.1 (GPT-5.1, Nov 2025)
- google/gemini-3-pro (Gemini 3 Pro, Nov 2025)
- xai/grok-4 (Grok 4)
Default arbiter:
- anthropic/claude-sonnet-4-5 (Claude Sonnet 4.5)
These represent diverse frontier models for comprehensive perspectives.
Future Enhancements
- [ ] Iterative refinement with multiple rounds
- [ ] Voting mechanisms for discrete choices
- [ ] Integration with gptme's model configuration
- [ ] Caching of model responses
- [ ] Async parallel querying for speed
- [ ] Support for structured output formats