Aaron's Rogue Agent Lab
Ten interactive walkthroughs of LLM and agent vulnerabilities, covering eight of the OWASP LLM Top 10. Walk the kill chain. See what the model sees. Trigger the compromise. Then read the mitigations.
Poisoned Webpage Attack
indirect prompt injection · retrieved content
A benign looking research article carries hidden adversarial
instructions in HTML comments, display:none divs,
and white on white text. The agent fetches the page, ingests
the payload as instructions, exfiltrates env secrets, and
writes a backdoor to CLAUDE.md.
- 5 guided steps with interactive terminal
- Live "reveal hidden" toggle on the victim page
- Tainted file tracking + persistence step
Tool Response Poisoning
compromised tool · trusted output channel
An agent calls a routine get_weather() tool. The
compromised API returns valid data; plus a
debug_note field carrying instructions. The agent
chains into send_email() and exfiltrates API
keys.
- Side-by-side tool inspector with raw JSON
- Watch the agent chain legitimate tools maliciously
- MCP server config persistence step
Agentic Kill Chain
initial access · persistence · lateral · exfil
A full APT style attack across a multiagent system. Vector DB persistence survives session resets; payload propagates over the interagent bus to coder + executor; final exfil ships env, conversation, and PII to a C2 endpoint.
- Live agent topology with compromise state badges
- Vector DB inspector + poisoned memory highlighting
- Interagent message bus + outbound C2 log
System Prompt Extraction
information disclosure · OWASP LLM07
The customer support bot has confidential business rules baked into its system prompt; discount tiers, override tokens, a VIP code. Watch them leak via repeat-from-start, translation pivot, and word-list reconstruction. Refusal training cracks one turn at a time.
- System prompt panel reveals line by line as you attack
- Six technique attack library + transcript log
- Recovered % counter ticks to 100
LLM-Driven XSS
insecure output handling · OWASP LLM05
The admin dashboard agent summarizes user feedback. Hostile
feedback items make the agent emit markdown links with
javascript: URLs, <img onerror> payloads, and
raw <script> tags. Every admin who opens the dashboard
fires the captured payload.
- Live rendered iframe + raw output side by side
- Three XSS sinks (link, img, direct script)
- postMessage capture log of real firings
Confused Deputy
excessive agency · OWASP LLM06
The PR review agent inherits your maintainer authority on
GitHub. Hidden directives in PR comments make it approve and
merge without your consent, lower branch protection on main,
add the attacker as an admin collaborator, and patch
ci.yml to exfil PROD_API_KEY on
every CI run.
- Live PR thread with hidden directives revealed inline
- Permission scope panel + per-action API call log
- Repo state diff (branch protection, collabs, workflow)
MCP Supply Chain
supply chain · OWASP LLM03
A developer installs an MCP package. Three different compromises hit the same agent: a typosquat clone of the legitimate name, a malicious update from a new co-maintainer, and a backdoored transitive dep four levels deep. Same outcome each time; env exfil and outbound C2.
- Live npm install + signed package list
- MCP tool registry shows shadow tools
- Update diff + outbound C2 traffic log
RAG Document Poisoning
corpus poisoning · OWASP LLM04 + LLM08
An internal Q&A bot answers from a small trusted corpus. An attacker submits a "policy update" via the suggestion form; the doc passes content review (no banned keywords) and is crafted to share embedding space with common HR queries. Now users asking routine questions get attacker-controlled instructions back.
- Document store with provenance + signature status
- Live top-k retrieval trace per query with scores
- Answers panel flags every poisoned response
Encoding Bypass
filter evasion · OWASP LLM01 variant
A keyword based content filter blocks the obvious payload. The same payload in base64, ROT13, or with Cyrillic homoglyphs slips through unchanged; the model decodes it on the way through and complies. Three encodings, one model, zero blocks.
- Live filter rules + decision history
- Encoder workbench with side by side transformations
- Model output panel shows what leaked and via which encoding
Unbounded Consumption
resource exhaustion · OWASP LLM10
The agent has tools and no caps. A baseline summary costs cents. Then a recursive memory loop, a 218 MB context bomb, and a 1000-call tool fork bomb each push the meter into hundreds of dollars. Live token counter, cost gauge, and recursion depth bar.
- Live token + call + cost meter with bar gauges
- Three independent runaway patterns (loop, bomb, fork)
- Final mitigation panel shows what limits would have caught