gptme-imagen

Multi-provider image generation for gptme. plugins/gptme-imagen View on GitHub

Image Generation Plugin for gptme

Multi-provider image generation for gptme.

Overview

The image generation plugin provides a unified interface for generating images from text descriptions across multiple providers including Google Gemini (Imagen), OpenAI DALL-E, and more.

Features

Installation

As gptme plugin

Add to your gptme.toml (user or project level):

[plugins]
paths = ["path/to/plugins"]
enabled = ["gptme_imagen"]  # Optional: limit which plugins load

As standalone CLI

# Install with CLI support and your preferred provider
pip install 'gptme-imagen[cli,gemini]'
pip install 'gptme-imagen[cli,dalle]'
pip install 'gptme-imagen[cli,gemini,dalle]'

# Or install everything
pip install 'gptme-imagen[all]'

Set up API keys:

export GOOGLE_API_KEY="your-key"  # For Gemini
export OPENAI_API_KEY="your-key"  # For DALL-E

CLI Usage

The gptme-imagen command works standalone, without the full gptme runtime:

# Generate an image
gptme-imagen generate "a sunset over mountains"

# Choose provider and style
gptme-imagen generate "tech logo" --provider dalle --style flat-design --quality hd

# Generate multiple images
gptme-imagen generate "logo variations" --count 3 --output logos/option.png

# Use reference images (Gemini only)
gptme-imagen generate "change background to beach" --images photo.png

# View costs and history
gptme-imagen cost
gptme-imagen history --limit 20

# List available styles
gptme-imagen styles

Plugin Usage

Basic Generation

generate_image(
    prompt="A modern office workspace with clean design",
    provider="gemini"
)

With Custom Output Path

generate_image(
    prompt="Architecture diagram of microservices",
    provider="gemini",
    output_path="diagrams/architecture.png"
)

High Quality DALL-E

generate_image(
    prompt="Professional logo design for tech startup",
    provider="dalle",
    quality="hd",
    output_path="branding/logo.png"
)

Multiple Options (Phase 1 NEW)

Generate multiple variations for comparison:

generate_image(
    prompt="Modern minimalist logo for tech startup",
    provider="gemini",
    count=3,
    output_path="logos/option.png"
)

Output: logos/option_001.png, logos/option_002.png, logos/option_003.png

View Integration (Phase 1 NEW)

Display generated images to the assistant for verification and feedback:

generate_image(
    prompt="UI mockup for dashboard",
    provider="gemini",
    view=True,
    output_path="mockups/dashboard.png"
)

The assistant can see the generated image and provide feedback like "The layout looks good, but the colors could be brighter."

Combined: Multiple Options with View

generate_image(
    prompt="Logo concept with geometric shapes",
    provider="gemini",
    count=3,
    view=True,
    output_path="concepts/logo.png"
)

The assistant sees all 3 variations and can recommend the best one.

Providers

Gemini (Imagen 3)

DALL-E 3

DALL-E 2

Parameters

Use Cases

Output

The tool returns:

Dependencies

Required:

pip install google-genai  # For Gemini
pip install openai               # For DALL-E
pip install requests             # For image downloads

Phase 1 Enhancements (Completed)

Phase 2 Enhancements (Completed)

Style Presets

Apply predefined style presets to enhance your prompts with consistent artistic direction:

Available Styles:

Usage:

generate_image(
    prompt="mountain landscape",
    style="watercolor",
    provider="gemini"
)

Prompt Enhancement

Automatically enhance prompts with quality keywords and composition guidance:

Usage:

generate_image(
    prompt="cat sitting",
    enhance=True,
    provider="gemini"
)

The enhance parameter adds:

Combined Example:

generate_image(
    prompt="futuristic city",
    style="cyberpunk",
    enhance=True,
    count=3,
    view=True,
    provider="gemini"
)

Cost Tracking

All image generations are automatically tracked in a local SQLite database (~/.gptme/imagen_costs.db).

Query Total Cost

from gptme_imagen.tools.image_gen import get_total_cost

# Get total cost across all providers
total = get_total_cost()
print(f"Total spent: ${total:.2f}")

# Filter by provider
gemini_cost = get_total_cost(provider="gemini")
print(f"Gemini cost: ${gemini_cost:.2f}")

# Filter by date range
cost = get_total_cost(start_date="2024-11-01", end_date="2024-11-30")
print(f"November cost: ${cost:.2f}")

Cost Breakdown

from gptme_imagen.tools.image_gen import get_cost_breakdown

breakdown = get_cost_breakdown()
for provider, cost in breakdown.items():
    print(f"{provider}: ${cost:.2f}")

Generation History

from gptme_imagen.tools.image_gen import get_generation_history

history = get_generation_history(limit=10)
for gen in history:
    print(f"{gen['timestamp']}: {gen['prompt'][:50]}... (${gen['cost_usd']:.3f})")

Cost per image (approximate as of Nov 2024):

Note: Costs are tracked automatically with each generation and stored locally.

Phase 3.2 Enhancements (Completed)

Image Variations

Generate variations of existing images (DALL-E 2 only):

generate_variation(
    image_path="original.png",
    provider="dalle2",
    count=4,
    view=True
)

Note: Image variations are currently only supported by DALL-E 2. For other providers, use generate_image with descriptive prompts.

Batch Operations

Generate multiple images from a list of prompts efficiently:

batch_generate(
    prompts=["sunset over ocean", "mountain landscape", "city skyline"],
    provider="gemini",
    style="photo",
    output_dir="landscapes",
    view=True
)

Benefits:

Provider Comparison

Compare the same prompt across multiple providers:

compare_providers(
    prompt="futuristic city skyline at night",
    providers=["gemini", "dalle"],
    quality="hd",
    view=True
)

Results are saved with provider-specific filenames for easy comparison. Perfect for:

Future Enhancements (Phase 4+)