`quint test-platforms`

An automated verification harness that runs the Platform Test Matrix against a live daemon + cloud stack. Designed so you can run it in CI, on your own laptop during development, or before every demo.

Design

The CLI is a new subcommand in cmd/proxy/test_platforms.go, registered in main.go alongside daemon, watch, setup. It is mostly a verification tool, not a simulation tool — it assumes the user (or a wrapper script) launches the agent, types a prompt, and lets the CLI verify the results afterward. This is the honest constraint: you can’t simulate Cursor or Claude Desktop without reverse-engineering their entire UI flow. What you can do is verify Quint’s observation layer caught what actually happened.

Usage

# Verify a specific platform right now
quint test-platforms --platform cursor --pid 59247

# Wait for a platform to be launched and verify the next session
quint test-platforms --platform claude-code --watch

# Run the full registry — prompts you to launch each platform in sequence
quint test-platforms --all --interactive

# Run against cloud too (requires a deploy token)
quint test-platforms --platform cursor --pid 59247 --cloud

# CI mode: expects stdin input, writes machine-readable output to results.yml
quint test-platforms --platform cursor --pid 59247 --output results.yml --quiet

Flow

The Checks

Each check maps 1:1 to a row in the platform test matrix:

Check 1 — `es_process_exec`

What: Confirms the Endpoint Security extension delivered a NOTIFY_EXEC event for the target PID to the daemon. How: GET localhost:8080/debug/es-events?pid=<PID>&since=<T-5s> — daemon exposes a ring buffer of recent ES events (already planned per memory project_ingestion_gaps.md; this may require a new debug endpoint if not yet exposed). Pass: PID appears in event stream within 5 seconds of process start. Common failure: ES buffer overflow (see project_ingestion_gaps.md) or ColdStart race.

Check 2 — `agent_identified`

What: Confirms unisession.Tracker has a session for this PID and it’s attributed to the expected platform, not unknown. How: GET localhost:8080/debug/sessions → find session by PID → assert platform == expected. Pass: The session’s platform field matches exactly. Common failure: Fingerprint mismatch (platform changed its binary signing, renamed their headers, etc.).

Check 3 — `code_signing` (only for platforms with `teamID` in registry)

What: Confirms the session record carries the expected macOS code-signing identity. How: Same session API → assert .teamID and .signingID match registry. Pass: Both fields present and equal to registry values. Common failure: Platform re-signed under a new team; binary was replaced by a fake with a different cert.

Check 4 — `ne_intercepts`

What: Confirms the Network Extension relayed at least one HTTPS flow for this PID, not passthrough. How: GET localhost:8080/debug/flows?pid=<PID>&since=<T-30s> → assert at least one flow has relay: true. Pass: One or more intercepted flows. Common failure: Domain not in NE allowlist; HTTP/2 negotiation causes bypass (38% of flows per memory).

Check 5 — `llm_parser`

What: Confirms the llmparse router correctly identified the LLM protocol and emitted at least one tool call (or assistant text, if no tool call happened). How: GET localhost:8080/api/sessions/timeline?pid=<PID> → assert at least one tool_call or assistant_text entry, zero PARSE_FAILED warnings in the last 30s of daemon log. Pass: Timeline has content, no parse failures. Common failure: Wrong parser selected (router logic in llmparse/router.go), new API shape not recognized.

Check 6 — `cloud_session` (only with `--cloud` flag)

What: Confirms the session was delivered to the cloud and a Postgres row exists. How: GET api.quintai.dev/v1/sessions?agent_id=<local_id> with deploy token → assert non-empty response. Pass: Cloud returns the session. Common failure: Forwarder overflow, ingest 4xx (schema version mismatch, validation reject), pipeline consumer lag.

Check 7 — `events_attributed` (only with `--cloud` flag)

What: Confirms all tool calls observed in the local timeline also exist in the cloud, stamped with the correct agent_id. How: GET api.quintai.dev/v1/events?session_id=<cloud_session_id> → compare count to local timeline. Pass: Cloud count equals local count, all rows have agent_id == expected. Common failure: Parent-cascade over-attribution (subagents stamping wrong ID), dedup drop, forwarder backpressure.

Output Format

# results.yml
platform: cursor
version: "0.43.2"
pid: 59247
session_id: "59247-1777085379101"
macos_version: "15.5"
quint_version: "1.0.3"
tested_at: "2026-05-03T14:22:11Z"

checks:
  es_process_exec:
    status: pass
    latency_ms: 12
  agent_identified:
    status: pass
    detected_platform: cursor
    detection_layers: [teamID, signingID, domain, headers, prompt, UA]
  code_signing:
    status: pass
    team_id: VDXQ22DGB9
    signing_id: com.todesktop.230313mzl4w4u92
  ne_intercepts:
    status: pass
    flow_count: 14
    domains: ["api2.cursor.sh", "api3.cursor.sh"]
  llm_parser:
    status: pass
    tool_calls: 3
    assistant_text_blocks: 5
    parse_failures: 0
    parser_used: generic  # cursor's Connect RPC falls through to generic
  cloud_session:
    status: pass
    cloud_id: "a1b2c3d4-..."
  events_attributed:
    status: pass
    local_count: 3
    cloud_count: 3

warnings:
  - "x-cursor-client-version header not present (was in registry)"
  - "Using generic LLM parser for Cursor's Connect RPC traffic"

overall: pass

Implementation Plan

The CLI is ~400 lines of Go. No new dependencies beyond stdlib + the existing daemon HTTP client. Rough file layout:

cmd/proxy/
  test_platforms.go        # cobra-style subcommand entry
  
internal/testplatforms/
  runner.go                # runs the 7 checks
  checks.go                # individual check functions
  registry.go              # loads platform definitions (reuses agentdetect)
  report.go                # formats output (yaml, human)
  debug_client.go          # HTTP client to localhost:8080 debug endpoints

Prerequisites that need to exist

Two daemon endpoints will likely need to be added/exposed:

GET /debug/es-events — ring buffer of recent ES events (may already exist internally; needs to be exposed on localhost:8080).
GET /debug/sessions with detailed fields including teamID, signingID, platform — may need to extend the existing sessions endpoint.

Both require a X-Quint-Dashboard-Token header (per existing convention in memory reference_ingestion_debugging.md).

Development order

Day 1: Add the two debug endpoints to the daemon. 100 lines of Go.
Day 2: Build the CLI scaffold + Check 1 + Check 2 (process + identification). Manual test on Claude Code.
Day 3: Add Checks 3-5 (signing, NE, parser). Manual test on Claude Code again.
Day 4: Add --cloud mode + Checks 6-7.
Day 5: Report formatting, --interactive mode, --all mode, docs.

Total: one week for one engineer.

How to Use It Before a Demo

The morning of a demo, on the actual demo laptop:

# Pre-flight: make sure everything still works
quint test-platforms --platform claude-code --watch &
# In another terminal: launch Claude Code, type "list files in this directory"
# Back in the test-platforms terminal: should print PASS in green within 10s

# Run the full matrix
quint test-platforms --all --interactive --cloud
# Follows the prompts, launch each agent, complete a minimal interaction

# If any FAIL, stop the demo, fix or swap out the failing platform

No more “I hope Cursor works today.”

Evolution

Once this exists, the next upgrade is a passive mode that runs continuously in the daemon (not as a separate CLI) and logs a rolling report of which platforms have been observed working in the last 7 days. That turns into the “Platform Coverage” dashboard page — the honest public-facing version of the current aspirational list.

Platform Test Matrix — the checklist this CLI automates
Troubleshooting — for when a check fails

Getting Started

Cloud

Intelligence

Dashboard

Operations

Playbook

Protobuf Schemas

quint test-platforms

`quint test-platforms`

Design

Usage

Flow

The Checks

Check 1 — `es_process_exec`

Check 2 — `agent_identified`

Check 3 — `code_signing` (only for platforms with `teamID` in registry)

Check 4 — `ne_intercepts`

Check 5 — `llm_parser`

Check 6 — `cloud_session` (only with `--cloud` flag)

Check 7 — `events_attributed` (only with `--cloud` flag)

Output Format

Implementation Plan

Prerequisites that need to exist

Development order

How to Use It Before a Demo

Evolution

Getting Started

Cloud

Intelligence

Dashboard

Operations

Playbook

Protobuf Schemas

Documentation Index

​quint test-platforms

​Design

​Usage

​Flow

​The Checks

​Check 1 — es_process_exec

​Check 2 — agent_identified

​Check 3 — code_signing (only for platforms with teamID in registry)

​Check 4 — ne_intercepts

​Check 5 — llm_parser

​Check 6 — cloud_session (only with --cloud flag)

​Check 7 — events_attributed (only with --cloud flag)

​Output Format

​Implementation Plan

​Prerequisites that need to exist

​Development order

​How to Use It Before a Demo

​Evolution

​Related

`quint test-platforms`

Design

Usage

Flow

The Checks

Check 1 — `es_process_exec`

Check 2 — `agent_identified`

Check 3 — `code_signing` (only for platforms with `teamID` in registry)

Check 4 — `ne_intercepts`

Check 5 — `llm_parser`

Check 6 — `cloud_session` (only with `--cloud` flag)

Check 7 — `events_attributed` (only with `--cloud` flag)

Output Format

Implementation Plan

Prerequisites that need to exist

Development order

How to Use It Before a Demo

Evolution

Related