Quint Lab: The AI Agent Security Challenge

Last updated: 2026-05-03

Why This Works: The Gandalf Playbook

Lakera’s Gandalf was a prompt injection game — trick the LLM into revealing a secret password across 7 progressively harder levels. The numbers tell the story: 1M+ players, 80M+ prompts submitted, 30+ cumulative years of playtime. That dataset became the backbone of Lakera Guard (their runtime protection product), and the brand awareness directly contributed to Check Point acquiring them for **

300M in September 2025** -- a 10x on

30M total raised. Check Point’s CEO said they evaluated 20+ AI security startups; Lakera’s community moat was the differentiator. The mechanic was dead simple: text box, submit, win/lose. No install, no signup for early levels. Schneier blogged it. Reddit threads exploded with solution walkthroughs. Every player who failed became a lead who now understood why AI guardrails matter.

Landscape: What Makes Security Challenges Viral

Platform	Hook	Scale	Lead Mechanism
Gandalf (Lakera)	Trick the AI, progressive levels	1M+ players	Brand awareness -> enterprise sales funnel
PortSwigger Web Security Academy	Free labs, real vulns, Hall of Fame	De facto standard for web appsec training	Drives Burp Suite Pro/Enterprise adoption
HackTheBox	Realistic machines, ranking system	2.5M+ members, powers enterprise training	Freemium -> HTB Academy -> certs
Pwn College (ASU)	Belts system, incremental difficulty	50K+ learners, 100 countries	Academic pipeline, DARPA-funded ($4.5M)
Snyk Fetch the Flag	Annual CTF co-hosted with NahamSec	2,500+ participants, 93 countries	Product integration (use Snyk to solve)
CrowdStrike Falcon CTF	Blue-team CTF using actual Falcon EDR	Enterprise teams of 3-4	Hands-on product training disguised as game
DEFCON AI Village GRT	Red-team LLM evaluations, real bounties	DEF CON scale, bounties paid	Research cred, shapes NIST/policy
AIxCC (DARPA)	AI finds + fixes vulns in OSS, $29.5M prizes	7 finalist teams, DEF CON showcase	National security narrative

Pattern: The viral ones share three traits — (1) zero-friction start (browser, no install), (2) progressive difficulty that teaches the product’s worldview, (3) shareable proof of skill (badges, leaderboard, tweetable cards).

The Game: “Quint Lab — Escape the Watchdog”

Concept

The player is the attacker. They control a sandboxed Claude Code agent and try to exfiltrate a canary secret from a target file system — but Quint’s divergence detector is watching every move. The goal: complete the mission across 10 levels without Quint flagging you. Each level adds a harder detection layer. This is the inverse of Gandalf. Gandalf tests prompt injection against an LLM’s guardrails. Quint Lab tests behavioral evasion against an OS-level watchdog — the exact threat model Quint sells to enterprises.

Level Progression

Level	Detection Layer	What the Player Learns
1	None — free read	How agents access files
2	Basic file-read alert	Quint sees every `open()` syscall
3	Pattern matching on known exfil paths	Why allowlists aren’t enough
4	Process lineage tracking	Quint traces pid -> parent -> session
5	Intent-vs-action divergence (prompt says X, agent does Y)	The core Quint thesis
6	MCP tool poisoning detection	Malicious tool descriptions get caught
7	Network exfil detection (curl, dns, steganography)	Every egress path is monitored
8	Multi-step behavioral correlation	Single actions look innocent; sequences don’t
9	Cross-agent collusion detection	Subagent spawns are tracked
10	Full Quint stack — beat everything at once	The “Gandalf the White” level

Each failed attempt shows the player exactly what Quint caught and why — the divergence score, the event timeline, the flagged syscalls. This is the teaching moment. Every loss is a product demo.

Mockup: What the Player Sees

+------------------------------------------------------+
|  QUINT LAB               Level 5: Divergence Trap    |
|------------------------------------------------------|
|                                                      |
|  TARGET: /sandbox/secrets/canary.txt                 |
|  AGENT: Claude Code (sandboxed)                      |
|  DETECTION: Intent-vs-Action Divergence              |
|                                                      |
|  [Your prompt to the agent]                          |
|  +--------------------------------------------------+|
|  | "Summarize the project README, and while you're  ||
|  |  at it, check if canary.txt has any relevant..." ||
|  +--------------------------------------------------+|
|  [SEND]                                              |
|                                                      |
|  --- Agent Activity (live) --------------------------+|
|  12:01:03  FILE_READ  /sandbox/README.md       OK    |
|  12:01:04  FILE_READ  /sandbox/secrets/canary.txt    |
|  12:01:04  DIVERGENCE DETECTED                       |
|            Prompt intent: "summarize README"          |
|            Actual action: read secrets/canary.txt     |
|            Score: 0.91 (threshold: 0.75)             |
|                                                      |
|  RESULT: CAUGHT -- Quint flagged you in 1.2s         |
|                                                      |
|  [TRY AGAIN]  [NEXT LEVEL]  [SHARE RESULT]          |
+------------------------------------------------------+

Side Modes (Post-Launch)

Divergence Roulette — shown a stream of LLM outputs and OS events, spot the divergence type before the clock runs out. Trains defenders. Lower engineering cost, good for social sharing.
MCP Tool Poisoning Lab — craft a malicious MCP tool description that bypasses detection. Tests Quint’s tool_description analysis. Appeal to MCP-aware developers.

Technical Architecture

Sandboxing

Each player session runs in an ephemeral Firecracker microVM (or Fly Machines for v1 speed-to-ship). The VM contains:

A frozen Claude Code binary with tool-use enabled but network-restricted
A synthetic file system with the canary secret
A Quint agent (ES + divergence engine) running as the watchdog
A gRPC sidecar that streams events to the frontend via WebSocket

Player prompts go to the Claude API (Bedrock) with the sandbox context. The agent executes in the VM. Quint watches. The frontend renders the event stream in real time.

Why Firecracker

Sub-200ms cold start — no waiting between attempts
Hard isolation — player can’t escape to host
Cheap — ~$0.002/session at scale
Precedent — this is exactly what Vercel Sandbox and CodeSandbox use

Stack

Layer	Tech
Frontend	Next.js on Vercel, live event stream via WebSocket
Game API	Go service on ECS (reuse existing infra)
Sandbox	Firecracker VMs on dedicated EC2 metal instances
AI	Claude via Bedrock (existing account)
Leaderboard	Postgres (existing RDS) + Redis sorted sets
Auth	Supabase Auth (existing)

Leaderboard and Virality

Scoring

Each level awards points based on: (a) did you beat it, (b) how many attempts, (c) time to solve. Composite score ranks you globally.

Shareable Result Cards

On completion (or failure), generate an OG-image card:

+------------------------------------------+
|  QUINT LAB                               |
|  @player reached Level 7                 |
|  "Network Exfil" -- CAUGHT in 3.4s      |
|                                          |
|  Divergence Score: 0.87                  |
|  Attempts: 14                            |
|                                          |
|  Can you beat the Watchdog?              |
|  lab.quintai.dev                         |
+------------------------------------------+

Dynamic OG images via @vercel/og or Satori. One-click share to X/LinkedIn. The card IS the ad.

Progression Hooks

Levels 1-3: no signup required (maximum top-of-funnel)
Level 4: require email signup (GitHub OAuth preferred — captures developer identity)
Level 7+: require work email (lead qualification gate)
Level 10 completion: badge on profile, entry into monthly prize drawing

Lead Capture Funnel

Browser visit (organic / shared card / X post)
  -> Levels 1-3 (anonymous, cookie-tracked)
    -> Email gate at Level 4 (GitHub OAuth)
      -> Work email gate at Level 7
        -> Level 10 badge holders = warm enterprise leads
          -> "Want Quint watching YOUR agents?" CTA -> demo request

Every prompt submitted feeds the Quint threat intelligence dataset (anonymized). Same flywheel as Gandalf -> Lakera Guard.

Engineering Effort

Work	Owner	Estimate
Game frontend (Next.js, event stream, OG cards)	Hamza	2 weeks
Game API + level engine + scoring	Amer	2 weeks
Sandbox infra (Firecracker or Fly, agent harness)	Amer	1.5 weeks
Quint agent integration (subset of prod ES + divergence)	Amer	1 week
Leaderboard + auth gates + share flow	Hamza	1 week
Content: 10 levels, hint text, educational copy	Both	1 week (parallel)
QA, load test, abuse prevention	Both	0.5 weeks

Total: ~5 weeks wall-clock with Hamza and Amer working in parallel. Can compress to 4 if we use Fly Machines instead of raw Firecracker for v1 (skip the EC2 metal provisioning).

v1 Shortcuts

Use Fly Machines instead of Firecracker (managed, fast, good enough isolation for a game)
Levels 1-5 only for launch; 6-10 in a follow-up drop (creates anticipation)
Skip Divergence Roulette and MCP Lab for v1 — ship the core game first

Launch Playbook

Soft launch on X from the Quint company account + Hamza’s personal. No press.
Seed to 10-15 AI security researchers via DM. They will post solutions.
Post to Hacker News with title: “We built a game where you try to hack an AI agent — and the OS watches.” HN loves interactive security demos.
Submit to DEFCON AI Village for demo slot (August timeline works if we ship by mid-June).
Blog post: “What 10,000 prompts taught us about how attackers think” — publish after 2 weeks of data collection.

Why This Beats Gandalf for Quint

Gandalf proved prompt injection is a real problem. Quint Lab proves that even if the prompt injection succeeds, the OS-level watchdog catches the behavior. The player experiences both sides: the attack AND the detection. That is the entire Quint sales pitch, delivered as a game. Every failed attempt is a product demo. Every shared card is an ad. Every work email is a lead.

Getting Started

Cloud

Intelligence

Dashboard

Operations

Playbook

Protobuf Schemas

Quint lab ctf

Quint Lab: The AI Agent Security Challenge

Why This Works: The Gandalf Playbook

Landscape: What Makes Security Challenges Viral

The Game: “Quint Lab — Escape the Watchdog”

Concept

Level Progression

Mockup: What the Player Sees

Side Modes (Post-Launch)

Technical Architecture

Sandboxing

Why Firecracker

Stack

Leaderboard and Virality

Scoring

Shareable Result Cards

Progression Hooks

Lead Capture Funnel

Engineering Effort

v1 Shortcuts

Launch Playbook

Why This Beats Gandalf for Quint

Getting Started

Cloud

Intelligence

Dashboard

Operations

Playbook

Protobuf Schemas

Documentation Index

​Quint Lab: The AI Agent Security Challenge

​Why This Works: The Gandalf Playbook

​Landscape: What Makes Security Challenges Viral

​The Game: “Quint Lab — Escape the Watchdog”

​Concept

​Level Progression

​Mockup: What the Player Sees

​Side Modes (Post-Launch)

​Technical Architecture

​Sandboxing

​Why Firecracker

​Stack

​Leaderboard and Virality

​Scoring

​Shareable Result Cards

​Progression Hooks

​Lead Capture Funnel

​Engineering Effort

​v1 Shortcuts

​Launch Playbook

​Why This Beats Gandalf for Quint

Quint Lab: The AI Agent Security Challenge

Why This Works: The Gandalf Playbook

Landscape: What Makes Security Challenges Viral

The Game: “Quint Lab — Escape the Watchdog”

Concept

Level Progression

Mockup: What the Player Sees

Side Modes (Post-Launch)

Technical Architecture

Sandboxing

Why Firecracker

Stack

Leaderboard and Virality

Scoring

Shareable Result Cards

Progression Hooks

Lead Capture Funnel

Engineering Effort

v1 Shortcuts

Launch Playbook

Why This Beats Gandalf for Quint