Image Generation Plugin for gptme
Multi-provider image generation for gptme.
Overview
The image generation plugin provides a unified interface for generating images from text descriptions across multiple providers including Google Gemini (Imagen), OpenAI DALL-E, and more.
Features
- Multi-provider support: Gemini, DALL-E 3, DALL-E 2
- Unified interface: Same API across all providers
- Automatic file handling: Images saved to disk with metadata
- Quality options: Choose between standard and HD quality
- Flexible sizing: Provider-specific size options
- Multiple options generation: Generate variations for comparison (count parameter) [Phase 1]
- View integration: Display images to assistant for verification (view parameter) [Phase 1]
- Enhanced error handling: Clear error messages with context [Phase 1]
- Style presets: 8 predefined styles for consistent results (style parameter) [Phase 2]
- Prompt enhancement: Automatic quality and composition improvements (enhance parameter) [Phase 2]
- Progress indicators: Visual feedback during multi-image generation [Phase 2]
- Cost tracking: Automatic tracking and reporting of generation costs [Phase 3]
- Standalone CLI: Use from the terminal without gptme installed [Phase 4]
Installation
As gptme plugin
Add to your gptme.toml (user or project level):
[plugins]
paths = ["path/to/plugins"]
enabled = ["gptme_imagen"] # Optional: limit which plugins load
As standalone CLI
# Install with CLI support and your preferred provider
pip install 'gptme-imagen[cli,gemini]'
pip install 'gptme-imagen[cli,dalle]'
pip install 'gptme-imagen[cli,gemini,dalle]'
# Or install everything
pip install 'gptme-imagen[all]'
Set up API keys:
export GOOGLE_API_KEY="your-key" # For Gemini
export OPENAI_API_KEY="your-key" # For DALL-E
CLI Usage
The gptme-imagen command works standalone, without the full gptme runtime:
# Generate an image
gptme-imagen generate "a sunset over mountains"
# Choose provider and style
gptme-imagen generate "tech logo" --provider dalle --style flat-design --quality hd
# Generate multiple images
gptme-imagen generate "logo variations" --count 3 --output logos/option.png
# Use reference images (Gemini only)
gptme-imagen generate "change background to beach" --images photo.png
# View costs and history
gptme-imagen cost
gptme-imagen history --limit 20
# List available styles
gptme-imagen styles
Plugin Usage
Basic Generation
generate_image(
prompt="A modern office workspace with clean design",
provider="gemini"
)
With Custom Output Path
generate_image(
prompt="Architecture diagram of microservices",
provider="gemini",
output_path="diagrams/architecture.png"
)
High Quality DALL-E
generate_image(
prompt="Professional logo design for tech startup",
provider="dalle",
quality="hd",
output_path="branding/logo.png"
)
Multiple Options (Phase 1 NEW)
Generate multiple variations for comparison:
generate_image(
prompt="Modern minimalist logo for tech startup",
provider="gemini",
count=3,
output_path="logos/option.png"
)
Output: logos/option_001.png, logos/option_002.png, logos/option_003.png
View Integration (Phase 1 NEW)
Display generated images to the assistant for verification and feedback:
generate_image(
prompt="UI mockup for dashboard",
provider="gemini",
view=True,
output_path="mockups/dashboard.png"
)
The assistant can see the generated image and provide feedback like "The layout looks good, but the colors could be brighter."
Combined: Multiple Options with View
generate_image(
prompt="Logo concept with geometric shapes",
provider="gemini",
count=3,
view=True,
output_path="concepts/logo.png"
)
The assistant sees all 3 variations and can recommend the best one.
Providers
Gemini (Imagen 3)
- Model: imagen-3-fast-generate-001
- Best for: Fast, high-quality generations
- Requires: GOOGLE_API_KEY
DALL-E 3
- Model: dall-e-3
- Best for: Creative, detailed images
- Quality: standard, hd
- Requires: OPENAI_API_KEY
DALL-E 2
- Model: dall-e-2
- Best for: Faster, lower cost
- Requires: OPENAI_API_KEY
Parameters
prompt(required): Text description of imageprovider(optional): "gemini", "dalle", or "dalle2" (default: "gemini")size(optional): Image size like "1024x1024" (default: "1024x1024")quality(optional): "standard" or "hd" (default: "standard")output_path(optional): Save location (default: auto-generated)count(optional): Number of variations to generate (default: 1) [Phase 1 NEW]view(optional): Display generated images to assistant (default: False) [Phase 1 NEW]
Use Cases
- Technical Diagrams: Architecture, flow charts, system diagrams
- UI Mockups: Interface designs, wireframes
- Presentations: Illustrations, graphics, slides
- Documentation: Visual aids, examples
- Branding: Logos, icons, graphics
- Concept Art: Prototypes, visual exploration
Output
The tool returns:
- Provider: Which service generated the image
- Prompt: Original text description
- Image Path: Where the image was saved
- Metadata: Model, size, quality details
Dependencies
Required:
pip install google-genai # For Gemini
pip install openai # For DALL-E
pip install requests # For image downloads
Phase 1 Enhancements (Completed)
- [x] Multiple options generation (count parameter)
- [x] View integration (view parameter)
- [x] Enhanced error handling
Phase 2 Enhancements (Completed)
Style Presets
Apply predefined style presets to enhance your prompts with consistent artistic direction:
Available Styles:
photo- Photorealistic renderingillustration- Digital illustration stylesketch- Hand-drawn sketch aesthetictechnical-diagram- Clean technical visualizationflat-design- Minimalist flat designcyberpunk- Futuristic neon aestheticwatercolor- Traditional watercolor paintingoil-painting- Classic oil painting style
Usage:
generate_image(
prompt="mountain landscape",
style="watercolor",
provider="gemini"
)
Prompt Enhancement
Automatically enhance prompts with quality keywords and composition guidance:
Usage:
generate_image(
prompt="cat sitting",
enhance=True,
provider="gemini"
)
The enhance parameter adds:
- Quality keywords (high quality, detailed, professional)
- Composition guidance for short prompts
- Avoids duplicate keywords already in prompt
Combined Example:
generate_image(
prompt="futuristic city",
style="cyberpunk",
enhance=True,
count=3,
view=True,
provider="gemini"
)
Cost Tracking
All image generations are automatically tracked in a local SQLite database (~/.gptme/imagen_costs.db).
Query Total Cost
from gptme_imagen.tools.image_gen import get_total_cost
# Get total cost across all providers
total = get_total_cost()
print(f"Total spent: ${total:.2f}")
# Filter by provider
gemini_cost = get_total_cost(provider="gemini")
print(f"Gemini cost: ${gemini_cost:.2f}")
# Filter by date range
cost = get_total_cost(start_date="2024-11-01", end_date="2024-11-30")
print(f"November cost: ${cost:.2f}")
Cost Breakdown
from gptme_imagen.tools.image_gen import get_cost_breakdown
breakdown = get_cost_breakdown()
for provider, cost in breakdown.items():
print(f"{provider}: ${cost:.2f}")
Generation History
from gptme_imagen.tools.image_gen import get_generation_history
history = get_generation_history(limit=10)
for gen in history:
print(f"{gen['timestamp']}: {gen['prompt'][:50]}... (${gen['cost_usd']:.3f})")
Cost per image (approximate as of Nov 2024):
- Gemini Imagen-3: $0.04 per image (standard)
- DALL-E 3: $0.04 per image (standard), $0.08 per image (HD)
- DALL-E 2: $0.02 per image
Note: Costs are tracked automatically with each generation and stored locally.
Phase 3.2 Enhancements (Completed)
Image Variations
Generate variations of existing images (DALL-E 2 only):
generate_variation(
image_path="original.png",
provider="dalle2",
count=4,
view=True
)
Note: Image variations are currently only supported by DALL-E 2. For other providers, use generate_image with descriptive prompts.
Batch Operations
Generate multiple images from a list of prompts efficiently:
batch_generate(
prompts=["sunset over ocean", "mountain landscape", "city skyline"],
provider="gemini",
style="photo",
output_dir="landscapes",
view=True
)
Benefits:
- Process multiple prompts in one call
- Automatic filename generation
- Progress tracking
- Optional view all results
Provider Comparison
Compare the same prompt across multiple providers:
compare_providers(
prompt="futuristic city skyline at night",
providers=["gemini", "dalle"],
quality="hd",
view=True
)
Results are saved with provider-specific filenames for easy comparison. Perfect for:
- Evaluating provider strengths
- Choosing best result for your use case
- A/B testing prompts
Future Enhancements (Phase 4+)
- [ ] Local Stable Diffusion support
- [ ] Image editing with masks (inpainting)
- [ ] Advanced image-to-image transformations