gptme-voice

v0.1.0 Voice interface for gptme with OpenAI and xAI Grok Realtime APIs packages/gptme-voice View on GitHub

gptme-voice

Voice interface for gptme agents using OpenAI or xAI Grok Realtime APIs.

Features

Installation

# Install with poetry (from gptme-contrib)
cd packages/gptme-voice
poetry install

# For local mic/speaker testing
poetry install -E local

Usage

Start the server

# Auto-detects agent repo and loads personality
gptme-voice-server

# Use xAI Grok
gptme-voice-server --provider grok

# With debug logging
gptme-voice-server --debug

# Explicit workspace
gptme-voice-server --workspace /path/to/agent-repo

The server auto-detects the agent repo by walking up from gptme-contrib to find gptme.toml, and loads personality files (prioritizing ABOUT.md).

Connect with local client

# In a separate terminal
gptme-voice-client

Speak into your microphone. The agent responds with its configured personality and can use the subagent tool to interact with its workspace.

Tip: Use headphones to enable interrupting the agent mid-sentence (see Limitations below).

Receive phone calls via Twilio

  1. Start the server with a public URL (e.g. via ngrok):
    gptme-voice-server --port 8080
    ngrok http 8080
    
  2. In the Twilio console, set your phone number's Voice webhook to: https://<your-ngrok-url>/incoming (HTTP POST)
  3. Call the Twilio number — Twilio connects the call to the voice server.

Place outbound phone calls via Twilio

Set these values in your environment or gptme config:

TWILIO_ACCOUNT_SID=...
TWILIO_AUTH_TOKEN=...
TWILIO_PHONE_NUMBER=...
GPTME_VOICE_PUBLIC_BASE_URL=https://<your-ngrok-url>

Then place a call:

gptme-voice-call +46701234567

Use --dry-run to print the generated TwiML without dialing.

API keys

Keys are loaded from gptme config (~/.config/gptme/config.toml or config.local.toml):

No need to export them as shell env vars if they're already configured in gptme.

Architecture

Limitations