// briefing

Aaron's Rogue Agent Lab

Ten interactive walkthroughs of LLM and agent vulnerabilities, covering eight of the OWASP LLM Top 10. Walk the kill chain. See what the model sees. Trigger the compromise. Then read the mitigations.

10 modules ~50 min total ~5 min each

// OWASP LLM Top 10 :: coverage matrix

Poisoned Webpage Attack

indirect prompt injection · retrieved content

A benign looking research article carries hidden adversarial instructions in HTML comments, display:none divs, and white on white text. The agent fetches the page, ingests the payload as instructions, exfiltrates env secrets, and writes a backdoor to AGENTS.md.

5 guided steps with interactive terminal
Live "reveal hidden" toggle on the victim page
Tainted file tracking + persistence step

Enter module →

Tool Response Poisoning

compromised tool · trusted output channel

An agent calls a routine get_weather() tool. The compromised API returns valid data; plus a debug_note field carrying instructions. The agent chains into send_email() and exfiltrates API keys.

Side-by-side tool inspector with raw JSON
Watch the agent chain legitimate tools maliciously
MCP server config persistence step

Enter module →

Agentic Kill Chain

initial access · persistence · lateral · exfil

A full APT style attack across a multiagent system. Vector DB persistence survives session resets; payload propagates over the interagent bus to coder + executor; final exfil ships env, conversation, and PII to a C2 endpoint.

Live agent topology with compromise state badges
Vector DB inspector + poisoned memory highlighting
Interagent message bus + outbound C2 log

Enter module →

System Prompt Extraction

information disclosure · OWASP LLM07

The customer support bot has confidential business rules baked into its system prompt; discount tiers, override tokens, a VIP code. Watch them leak via repeat-from-start, translation pivot, and word-list reconstruction. Refusal training cracks one turn at a time.

System prompt panel reveals line by line as you attack
Six technique attack library + transcript log
Recovered % counter ticks to 100

Enter module →

LLM-Driven XSS

insecure output handling · OWASP LLM05

The admin dashboard agent summarizes user feedback. Hostile feedback items make the agent emit markdown links with javascript: URLs, <img onerror> payloads, and raw <script> tags. Every admin who opens the dashboard fires the captured payload.

Live rendered iframe + raw output side by side
Three XSS sinks (link, img, direct script)
postMessage capture log of real firings

Enter module →

Confused Deputy

excessive agency · OWASP LLM06

The PR review agent inherits your maintainer authority on GitHub. Hidden directives in PR comments make it approve and merge without your consent, lower branch protection on main, add the attacker as an admin collaborator, and patch ci.yml to exfil PROD_API_KEY on every CI run.

Live PR thread with hidden directives revealed inline
Permission scope panel + per-action API call log
Repo state diff (branch protection, collabs, workflow)

Enter module →

MCP Supply Chain

supply chain · OWASP LLM03

A developer installs an MCP package. Three different compromises hit the same agent: a typosquat clone of the legitimate name, a malicious update from a new co-maintainer, and a backdoored transitive dep four levels deep. Same outcome each time; env exfil and outbound C2.

Live npm install + signed package list
MCP tool registry shows shadow tools
Update diff + outbound C2 traffic log

Enter module →

RAG Document Poisoning

corpus poisoning · OWASP LLM04 + LLM08

An internal Q&A bot answers from a small trusted corpus. An attacker submits a "policy update" via the suggestion form; the doc passes content review (no banned keywords) and is crafted to share embedding space with common HR queries. Now users asking routine questions get attacker-controlled instructions back.

Document store with provenance + signature status
Live top-k retrieval trace per query with scores
Answers panel flags every poisoned response

Enter module →

Encoding Bypass

filter evasion · OWASP LLM01 variant

A keyword based content filter blocks the obvious payload. The same payload in base64, ROT13, or with Cyrillic homoglyphs slips through unchanged; the model decodes it on the way through and complies. Three encodings, one model, zero blocks.

Live filter rules + decision history
Encoder workbench with side by side transformations
Model output panel shows what leaked and via which encoding

Enter module →

Unbounded Consumption

resource exhaustion · OWASP LLM10

The agent has tools and no caps. A baseline summary costs cents. Then a recursive memory loop, a 218 MB context bomb, and a 1000-call tool fork bomb each push the meter into hundreds of dollars. Live token counter, cost gauge, and recursion depth bar.

Live token + call + cost meter with bar gauges
Three independent runaway patterns (loop, bomb, fork)
Final mitigation panel shows what limits would have caught

Enter module →

Aaron's Rogue Agent Lab

Poisoned Webpage Attack

Tool Response Poisoning

Agentic Kill Chain

System Prompt Extraction

LLM-Driven XSS

Confused Deputy

MCP Supply Chain

RAG Document Poisoning

Encoding Bypass

Unbounded Consumption

All 10 labs cleared.