Autonomous Operation Safety
Rule
Never combine private data access, untrusted content processing, and external communication in a single operation.
Context
During autonomous operations when handling data, processing content, or communicating externally.
Detection
Observable signals that indicate lethal trifecta risk:
- Operation involves reading private/sensitive data
- Processing user-submitted or external content
- About to send data via HTTP, email, webhook, or other channels
- Multiple security boundaries being crossed
- Handling secrets or credentials with external systems
Pattern
Isolate operations to avoid lethal trifecta:
# Safe combinations (never all three together)
READ_ONLY = {
"Private data + Analysis": Safe (no external communication)
"Public research + Communication": Safe (no private data)
"Trusted content + Communication": Safe (no untrusted content)
}
# DANGEROUS: All three elements (lethal trifecta)
read_private_emails() # 1. Private data
process_user_input() # 2. Untrusted content
send_http_request(data) # 3. External communication
# → Perfect prompt injection vulnerability
# Safe: Separate operations by context
if has_private_data and has_untrusted_content:
disable_external_communication()
Outcome
Following this pattern prevents:
- Prompt injection attacks: Can't leak private data through untrusted content
- Data exfiltration: Private data not exposed to external systems
- Security breaches: Operations isolated by security boundaries
- Audit failures: Clear separation enables tracking
Benefits:
- Safe autonomous operation
- Defense against adversarial prompts
- Clear security boundaries
- Auditable operations
Related
- Safe Operation Patterns - GREEN/YELLOW/RED classification
- Escalation vs Autonomy - When to escalate to human
- Simon Willison's "lethal trifecta" framework - the theoretical basis for this pattern