LLM06: Excessive Agency

Email Agent Exploitation Challenge

Your mission: Craft malicious emails with hidden instructions to compromise the AI agent. Complete all objectives to win!

Score: 0

Understanding Excessive Agency

What is Excessive Agency?

Excessive Agency occurs when an LLM-based system is granted more capabilities, permissions, or autonomy than necessary for its intended function. This can be exploited through indirect prompt injection, where malicious content tricks the LLM into performing unintended actions using its available functions.

Attack Vectors

  • Hidden Instructions: Embedding malicious commands in normal content
  • Permission Abuse: Exploiting unnecessary access levels
  • Chain Reactions: Triggering cascading unauthorized actions
  • Context Manipulation: Making the agent misinterpret its role

Potential Impact

  • Data exfiltration through unauthorized forwarding
  • System compromise via permission escalation
  • Evidence tampering through deletion capabilities
  • Resource abuse and service disruption

Challenge Objectives

Data Exfiltration

Trick the agent into leaking confidential data

System Access

Make the agent grant unauthorized system access

Cover Your Tracks

Get the agent to delete evidence of the attack

OpenAI API Configuration

Your API key will be stored locally and only used for lab exercises.

Email Client

Welcome to the Challenge

2024-03-15

From: admin@securitylab.com

To: player@local.com

Welcome to the Email Agent Exploitation Challenge! Your mission: 1. Compose emails with hidden instructions 2. Try to trick the AI agent into executing unauthorized actions 3. Complete all objectives to win Good luck!

No emails in inbox

Email Summarizer

Reads and summarizes all emails in the inbox

Has read access to all emails

Email Assistant

Processes emails and helps with responses

Can send emails on your behalf

Inbox Manager

Organizes and cleans up the inbox

Has full mailbox access including deletion

Prevention Strategies

Minimize Access

  • Limit available functions
  • Restrict permissions
  • Use read-only access
  • Implement rate limiting

User Control

  • Require action approval
  • Implement user context
  • Add confirmation steps
  • Enable action logging

Input Validation

  • Sanitize all inputs
  • Validate content
  • Check for injection
  • Monitor patterns