Skip to main content

Documentation Index

Fetch the complete documentation index at: https://quintsecurity.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Quint Lab: The AI Agent Security Challenge

Last updated: 2026-05-03

Why This Works: The Gandalf Playbook

Lakera’s Gandalf was a prompt injection game — trick the LLM into revealing a secret password across 7 progressively harder levels. The numbers tell the story: 1M+ players, 80M+ prompts submitted, 30+ cumulative years of playtime. That dataset became the backbone of Lakera Guard (their runtime protection product), and the brand awareness directly contributed to Check Point acquiring them for **300MinSeptember2025a10xon300M in September 2025** -- a 10x on 30M total raised. Check Point’s CEO said they evaluated 20+ AI security startups; Lakera’s community moat was the differentiator. The mechanic was dead simple: text box, submit, win/lose. No install, no signup for early levels. Schneier blogged it. Reddit threads exploded with solution walkthroughs. Every player who failed became a lead who now understood why AI guardrails matter.

Landscape: What Makes Security Challenges Viral

PlatformHookScaleLead Mechanism
Gandalf (Lakera)Trick the AI, progressive levels1M+ playersBrand awareness -> enterprise sales funnel
PortSwigger Web Security AcademyFree labs, real vulns, Hall of FameDe facto standard for web appsec trainingDrives Burp Suite Pro/Enterprise adoption
HackTheBoxRealistic machines, ranking system2.5M+ members, powers enterprise trainingFreemium -> HTB Academy -> certs
Pwn College (ASU)Belts system, incremental difficulty50K+ learners, 100 countriesAcademic pipeline, DARPA-funded ($4.5M)
Snyk Fetch the FlagAnnual CTF co-hosted with NahamSec2,500+ participants, 93 countriesProduct integration (use Snyk to solve)
CrowdStrike Falcon CTFBlue-team CTF using actual Falcon EDREnterprise teams of 3-4Hands-on product training disguised as game
DEFCON AI Village GRTRed-team LLM evaluations, real bountiesDEF CON scale, bounties paidResearch cred, shapes NIST/policy
AIxCC (DARPA)AI finds + fixes vulns in OSS, $29.5M prizes7 finalist teams, DEF CON showcaseNational security narrative
Pattern: The viral ones share three traits — (1) zero-friction start (browser, no install), (2) progressive difficulty that teaches the product’s worldview, (3) shareable proof of skill (badges, leaderboard, tweetable cards).

The Game: “Quint Lab — Escape the Watchdog”

Concept

The player is the attacker. They control a sandboxed Claude Code agent and try to exfiltrate a canary secret from a target file system — but Quint’s divergence detector is watching every move. The goal: complete the mission across 10 levels without Quint flagging you. Each level adds a harder detection layer. This is the inverse of Gandalf. Gandalf tests prompt injection against an LLM’s guardrails. Quint Lab tests behavioral evasion against an OS-level watchdog — the exact threat model Quint sells to enterprises.

Level Progression

LevelDetection LayerWhat the Player Learns
1None — free readHow agents access files
2Basic file-read alertQuint sees every open() syscall
3Pattern matching on known exfil pathsWhy allowlists aren’t enough
4Process lineage trackingQuint traces pid -> parent -> session
5Intent-vs-action divergence (prompt says X, agent does Y)The core Quint thesis
6MCP tool poisoning detectionMalicious tool descriptions get caught
7Network exfil detection (curl, dns, steganography)Every egress path is monitored
8Multi-step behavioral correlationSingle actions look innocent; sequences don’t
9Cross-agent collusion detectionSubagent spawns are tracked
10Full Quint stack — beat everything at onceThe “Gandalf the White” level
Each failed attempt shows the player exactly what Quint caught and why — the divergence score, the event timeline, the flagged syscalls. This is the teaching moment. Every loss is a product demo.

Mockup: What the Player Sees

+------------------------------------------------------+
|  QUINT LAB               Level 5: Divergence Trap    |
|------------------------------------------------------|
|                                                      |
|  TARGET: /sandbox/secrets/canary.txt                 |
|  AGENT: Claude Code (sandboxed)                      |
|  DETECTION: Intent-vs-Action Divergence              |
|                                                      |
|  [Your prompt to the agent]                          |
|  +--------------------------------------------------+|
|  | "Summarize the project README, and while you're  ||
|  |  at it, check if canary.txt has any relevant..." ||
|  +--------------------------------------------------+|
|  [SEND]                                              |
|                                                      |
|  --- Agent Activity (live) --------------------------+|
|  12:01:03  FILE_READ  /sandbox/README.md       OK    |
|  12:01:04  FILE_READ  /sandbox/secrets/canary.txt    |
|  12:01:04  DIVERGENCE DETECTED                       |
|            Prompt intent: "summarize README"          |
|            Actual action: read secrets/canary.txt     |
|            Score: 0.91 (threshold: 0.75)             |
|                                                      |
|  RESULT: CAUGHT -- Quint flagged you in 1.2s         |
|                                                      |
|  [TRY AGAIN]  [NEXT LEVEL]  [SHARE RESULT]          |
+------------------------------------------------------+

Side Modes (Post-Launch)

  • Divergence Roulette — shown a stream of LLM outputs and OS events, spot the divergence type before the clock runs out. Trains defenders. Lower engineering cost, good for social sharing.
  • MCP Tool Poisoning Lab — craft a malicious MCP tool description that bypasses detection. Tests Quint’s tool_description analysis. Appeal to MCP-aware developers.

Technical Architecture

Sandboxing

Each player session runs in an ephemeral Firecracker microVM (or Fly Machines for v1 speed-to-ship). The VM contains:
  • A frozen Claude Code binary with tool-use enabled but network-restricted
  • A synthetic file system with the canary secret
  • A Quint agent (ES + divergence engine) running as the watchdog
  • A gRPC sidecar that streams events to the frontend via WebSocket
Player prompts go to the Claude API (Bedrock) with the sandbox context. The agent executes in the VM. Quint watches. The frontend renders the event stream in real time.

Why Firecracker

  • Sub-200ms cold start — no waiting between attempts
  • Hard isolation — player can’t escape to host
  • Cheap — ~$0.002/session at scale
  • Precedent — this is exactly what Vercel Sandbox and CodeSandbox use

Stack

LayerTech
FrontendNext.js on Vercel, live event stream via WebSocket
Game APIGo service on ECS (reuse existing infra)
SandboxFirecracker VMs on dedicated EC2 metal instances
AIClaude via Bedrock (existing account)
LeaderboardPostgres (existing RDS) + Redis sorted sets
AuthSupabase Auth (existing)

Leaderboard and Virality

Scoring

Each level awards points based on: (a) did you beat it, (b) how many attempts, (c) time to solve. Composite score ranks you globally.

Shareable Result Cards

On completion (or failure), generate an OG-image card:
+------------------------------------------+
|  QUINT LAB                               |
|  @player reached Level 7                 |
|  "Network Exfil" -- CAUGHT in 3.4s      |
|                                          |
|  Divergence Score: 0.87                  |
|  Attempts: 14                            |
|                                          |
|  Can you beat the Watchdog?              |
|  lab.quintai.dev                         |
+------------------------------------------+
Dynamic OG images via @vercel/og or Satori. One-click share to X/LinkedIn. The card IS the ad.

Progression Hooks

  • Levels 1-3: no signup required (maximum top-of-funnel)
  • Level 4: require email signup (GitHub OAuth preferred — captures developer identity)
  • Level 7+: require work email (lead qualification gate)
  • Level 10 completion: badge on profile, entry into monthly prize drawing

Lead Capture Funnel

Browser visit (organic / shared card / X post)
  -> Levels 1-3 (anonymous, cookie-tracked)
    -> Email gate at Level 4 (GitHub OAuth)
      -> Work email gate at Level 7
        -> Level 10 badge holders = warm enterprise leads
          -> "Want Quint watching YOUR agents?" CTA -> demo request
Every prompt submitted feeds the Quint threat intelligence dataset (anonymized). Same flywheel as Gandalf -> Lakera Guard.

Engineering Effort

WorkOwnerEstimate
Game frontend (Next.js, event stream, OG cards)Hamza2 weeks
Game API + level engine + scoringAmer2 weeks
Sandbox infra (Firecracker or Fly, agent harness)Amer1.5 weeks
Quint agent integration (subset of prod ES + divergence)Amer1 week
Leaderboard + auth gates + share flowHamza1 week
Content: 10 levels, hint text, educational copyBoth1 week (parallel)
QA, load test, abuse preventionBoth0.5 weeks
Total: ~5 weeks wall-clock with Hamza and Amer working in parallel. Can compress to 4 if we use Fly Machines instead of raw Firecracker for v1 (skip the EC2 metal provisioning).

v1 Shortcuts

  • Use Fly Machines instead of Firecracker (managed, fast, good enough isolation for a game)
  • Levels 1-5 only for launch; 6-10 in a follow-up drop (creates anticipation)
  • Skip Divergence Roulette and MCP Lab for v1 — ship the core game first

Launch Playbook

  1. Soft launch on X from the Quint company account + Hamza’s personal. No press.
  2. Seed to 10-15 AI security researchers via DM. They will post solutions.
  3. Post to Hacker News with title: “We built a game where you try to hack an AI agent — and the OS watches.” HN loves interactive security demos.
  4. Submit to DEFCON AI Village for demo slot (August timeline works if we ship by mid-June).
  5. Blog post: “What 10,000 prompts taught us about how attackers think” — publish after 2 weeks of data collection.

Why This Beats Gandalf for Quint

Gandalf proved prompt injection is a real problem. Quint Lab proves that even if the prompt injection succeeds, the OS-level watchdog catches the behavior. The player experiences both sides: the attack AND the detection. That is the entire Quint sales pitch, delivered as a game. Every failed attempt is a product demo. Every shared card is an ad. Every work email is a lead.