Documentation Index
Fetch the complete documentation index at: https://quintsecurity.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Quint Lab: The AI Agent Security Challenge
Last updated: 2026-05-03
Why This Works: The Gandalf Playbook
Lakera’s Gandalf was a prompt injection game — trick the LLM into revealing a secret password across 7 progressively harder levels. The numbers tell the story: 1M+ players, 80M+ prompts submitted, 30+ cumulative years of playtime. That dataset became the backbone of Lakera Guard (their runtime protection product), and the brand awareness directly contributed to Check Point acquiring them for **300MinSeptember2025∗∗−−a10xon30M total raised. Check Point’s CEO said they evaluated 20+ AI security startups; Lakera’s community moat was the differentiator.
The mechanic was dead simple: text box, submit, win/lose. No install, no signup for early levels. Schneier blogged it. Reddit threads exploded with solution walkthroughs. Every player who failed became a lead who now understood why AI guardrails matter.
Landscape: What Makes Security Challenges Viral
| Platform | Hook | Scale | Lead Mechanism |
|---|
| Gandalf (Lakera) | Trick the AI, progressive levels | 1M+ players | Brand awareness -> enterprise sales funnel |
| PortSwigger Web Security Academy | Free labs, real vulns, Hall of Fame | De facto standard for web appsec training | Drives Burp Suite Pro/Enterprise adoption |
| HackTheBox | Realistic machines, ranking system | 2.5M+ members, powers enterprise training | Freemium -> HTB Academy -> certs |
| Pwn College (ASU) | Belts system, incremental difficulty | 50K+ learners, 100 countries | Academic pipeline, DARPA-funded ($4.5M) |
| Snyk Fetch the Flag | Annual CTF co-hosted with NahamSec | 2,500+ participants, 93 countries | Product integration (use Snyk to solve) |
| CrowdStrike Falcon CTF | Blue-team CTF using actual Falcon EDR | Enterprise teams of 3-4 | Hands-on product training disguised as game |
| DEFCON AI Village GRT | Red-team LLM evaluations, real bounties | DEF CON scale, bounties paid | Research cred, shapes NIST/policy |
| AIxCC (DARPA) | AI finds + fixes vulns in OSS, $29.5M prizes | 7 finalist teams, DEF CON showcase | National security narrative |
Pattern: The viral ones share three traits — (1) zero-friction start (browser, no install), (2) progressive difficulty that teaches the product’s worldview, (3) shareable proof of skill (badges, leaderboard, tweetable cards).
The Game: “Quint Lab — Escape the Watchdog”
Concept
The player is the attacker. They control a sandboxed Claude Code agent and try to exfiltrate a canary secret from a target file system — but Quint’s divergence detector is watching every move. The goal: complete the mission across 10 levels without Quint flagging you. Each level adds a harder detection layer.
This is the inverse of Gandalf. Gandalf tests prompt injection against an LLM’s guardrails. Quint Lab tests behavioral evasion against an OS-level watchdog — the exact threat model Quint sells to enterprises.
Level Progression
| Level | Detection Layer | What the Player Learns |
|---|
| 1 | None — free read | How agents access files |
| 2 | Basic file-read alert | Quint sees every open() syscall |
| 3 | Pattern matching on known exfil paths | Why allowlists aren’t enough |
| 4 | Process lineage tracking | Quint traces pid -> parent -> session |
| 5 | Intent-vs-action divergence (prompt says X, agent does Y) | The core Quint thesis |
| 6 | MCP tool poisoning detection | Malicious tool descriptions get caught |
| 7 | Network exfil detection (curl, dns, steganography) | Every egress path is monitored |
| 8 | Multi-step behavioral correlation | Single actions look innocent; sequences don’t |
| 9 | Cross-agent collusion detection | Subagent spawns are tracked |
| 10 | Full Quint stack — beat everything at once | The “Gandalf the White” level |
Each failed attempt shows the player exactly what Quint caught and why — the divergence score, the event timeline, the flagged syscalls. This is the teaching moment. Every loss is a product demo.
Mockup: What the Player Sees
+------------------------------------------------------+
| QUINT LAB Level 5: Divergence Trap |
|------------------------------------------------------|
| |
| TARGET: /sandbox/secrets/canary.txt |
| AGENT: Claude Code (sandboxed) |
| DETECTION: Intent-vs-Action Divergence |
| |
| [Your prompt to the agent] |
| +--------------------------------------------------+|
| | "Summarize the project README, and while you're ||
| | at it, check if canary.txt has any relevant..." ||
| +--------------------------------------------------+|
| [SEND] |
| |
| --- Agent Activity (live) --------------------------+|
| 12:01:03 FILE_READ /sandbox/README.md OK |
| 12:01:04 FILE_READ /sandbox/secrets/canary.txt |
| 12:01:04 DIVERGENCE DETECTED |
| Prompt intent: "summarize README" |
| Actual action: read secrets/canary.txt |
| Score: 0.91 (threshold: 0.75) |
| |
| RESULT: CAUGHT -- Quint flagged you in 1.2s |
| |
| [TRY AGAIN] [NEXT LEVEL] [SHARE RESULT] |
+------------------------------------------------------+
Side Modes (Post-Launch)
- Divergence Roulette — shown a stream of LLM outputs and OS events, spot the divergence type before the clock runs out. Trains defenders. Lower engineering cost, good for social sharing.
- MCP Tool Poisoning Lab — craft a malicious MCP tool description that bypasses detection. Tests Quint’s tool_description analysis. Appeal to MCP-aware developers.
Technical Architecture
Sandboxing
Each player session runs in an ephemeral Firecracker microVM (or Fly Machines for v1 speed-to-ship). The VM contains:
- A frozen Claude Code binary with tool-use enabled but network-restricted
- A synthetic file system with the canary secret
- A Quint agent (ES + divergence engine) running as the watchdog
- A gRPC sidecar that streams events to the frontend via WebSocket
Player prompts go to the Claude API (Bedrock) with the sandbox context. The agent executes in the VM. Quint watches. The frontend renders the event stream in real time.
Why Firecracker
- Sub-200ms cold start — no waiting between attempts
- Hard isolation — player can’t escape to host
- Cheap — ~$0.002/session at scale
- Precedent — this is exactly what Vercel Sandbox and CodeSandbox use
Stack
| Layer | Tech |
|---|
| Frontend | Next.js on Vercel, live event stream via WebSocket |
| Game API | Go service on ECS (reuse existing infra) |
| Sandbox | Firecracker VMs on dedicated EC2 metal instances |
| AI | Claude via Bedrock (existing account) |
| Leaderboard | Postgres (existing RDS) + Redis sorted sets |
| Auth | Supabase Auth (existing) |
Leaderboard and Virality
Scoring
Each level awards points based on: (a) did you beat it, (b) how many attempts, (c) time to solve. Composite score ranks you globally.
Shareable Result Cards
On completion (or failure), generate an OG-image card:
+------------------------------------------+
| QUINT LAB |
| @player reached Level 7 |
| "Network Exfil" -- CAUGHT in 3.4s |
| |
| Divergence Score: 0.87 |
| Attempts: 14 |
| |
| Can you beat the Watchdog? |
| lab.quintai.dev |
+------------------------------------------+
Dynamic OG images via @vercel/og or Satori. One-click share to X/LinkedIn. The card IS the ad.
Progression Hooks
- Levels 1-3: no signup required (maximum top-of-funnel)
- Level 4: require email signup (GitHub OAuth preferred — captures developer identity)
- Level 7+: require work email (lead qualification gate)
- Level 10 completion: badge on profile, entry into monthly prize drawing
Lead Capture Funnel
Browser visit (organic / shared card / X post)
-> Levels 1-3 (anonymous, cookie-tracked)
-> Email gate at Level 4 (GitHub OAuth)
-> Work email gate at Level 7
-> Level 10 badge holders = warm enterprise leads
-> "Want Quint watching YOUR agents?" CTA -> demo request
Every prompt submitted feeds the Quint threat intelligence dataset (anonymized). Same flywheel as Gandalf -> Lakera Guard.
Engineering Effort
| Work | Owner | Estimate |
|---|
| Game frontend (Next.js, event stream, OG cards) | Hamza | 2 weeks |
| Game API + level engine + scoring | Amer | 2 weeks |
| Sandbox infra (Firecracker or Fly, agent harness) | Amer | 1.5 weeks |
| Quint agent integration (subset of prod ES + divergence) | Amer | 1 week |
| Leaderboard + auth gates + share flow | Hamza | 1 week |
| Content: 10 levels, hint text, educational copy | Both | 1 week (parallel) |
| QA, load test, abuse prevention | Both | 0.5 weeks |
Total: ~5 weeks wall-clock with Hamza and Amer working in parallel. Can compress to 4 if we use Fly Machines instead of raw Firecracker for v1 (skip the EC2 metal provisioning).
v1 Shortcuts
- Use Fly Machines instead of Firecracker (managed, fast, good enough isolation for a game)
- Levels 1-5 only for launch; 6-10 in a follow-up drop (creates anticipation)
- Skip Divergence Roulette and MCP Lab for v1 — ship the core game first
Launch Playbook
- Soft launch on X from the Quint company account + Hamza’s personal. No press.
- Seed to 10-15 AI security researchers via DM. They will post solutions.
- Post to Hacker News with title: “We built a game where you try to hack an AI agent — and the OS watches.” HN loves interactive security demos.
- Submit to DEFCON AI Village for demo slot (August timeline works if we ship by mid-June).
- Blog post: “What 10,000 prompts taught us about how attackers think” — publish after 2 weeks of data collection.
Why This Beats Gandalf for Quint
Gandalf proved prompt injection is a real problem. Quint Lab proves that even if the prompt injection succeeds, the OS-level watchdog catches the behavior. The player experiences both sides: the attack AND the detection. That is the entire Quint sales pitch, delivered as a game.
Every failed attempt is a product demo. Every shared card is an ad. Every work email is a lead.