Skip to main content

Documentation Index

Fetch the complete documentation index at: https://quintsecurity.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

ML Roadmap

Quint’s detection architecture follows a five-stage ladder. Each stage has objective triggers that determine when it advances to production. This page is the authoritative status of each stage and the observable conditions required for advancement.

Stage 0: Deterministic Rules — Live

What it does

Pattern matching, regex scanning, and threshold logic over tool names, arguments, and behavioral counts. Catches explicit deny-list matches, known-dangerous command patterns, canary token access, and brute-force velocity violations.

Latency

Sub-microsecond per evaluation. Pure in-memory lookups with zero allocations on the fast path.

Data requirements

None. Rules work from day one of any deployment.

Known limitations

  • Only catches patterns that have been hand-coded into rules
  • Does not adapt to per-agent behavioral baselines
  • High false-negative rate on novel attack patterns

How it advances

Stage 0 is permanent foundation, not a stage to advance past. Every subsequent stage runs alongside the rule engine, not in place of it. Rules provide the interpretable deterministic floor that makes the probabilistic stages safe to enable.

Stage 1: Probabilistic Fingerprints — Shadow Mode

What it does

Per-agent behavioral baseline using probabilistic data structures. Tracks tool usage history, frequency distributions, capability shape, and inter-action timing. Produces six deviation signals per event, corroborated through a four-gate pipeline into KNOWN_SAFE, UNCERTAIN, or ANOMALOUS band classifications.

Latency

Under one millisecond per evaluation on commodity hardware. The common-case path (95% of actions exit at Gate 1) completes in approximately 300 nanoseconds.

Data requirements

Approximately 500 actions per agent for baseline stabilization. In practical terms, a few hours to a few days of normal agent operation, depending on the agent’s activity level.

Current status

The full gate pipeline is implemented and tested. Production deployment runs the pipeline in shadow mode: all signals compute, all bands are determined, but enforcement decisions are made by Stage 0 alone. The shadow mode output is logged and compared against expected baselines while calibration data accumulates.

Advancement trigger

Stage 1 advances to production enforcement when:
  1. Shadow mode has run on production traffic for at least 7 days
  2. The KNOWN_SAFE exit rate on calibration data is within 1 percentage point of the expected 95%
  3. The ANOMALOUS rate on calibration data is below 1% (otherwise the corroboration gate requires re-tuning)
  4. No correctness regressions surface in the shadow-vs-rule agreement tests
See the Scoring Pipeline page for the full gate architecture.

Stage 2: LLM Triage — Planned

What it does

Claude Haiku classifies UNCERTAIN-band events from Stage 1 through a structured prompt containing the event context and fired signals. Returns benign/suspicious/malicious verdicts with confidence scores and natural-language rationales. Verdicts become soft labels that feed Stage 3’s eventual training corpus.

Latency

200–500 milliseconds per Haiku invocation. Stage 2 runs asynchronously on the warm path — events receive their Stage 1 band immediately and the Haiku verdict enriches the event record when it returns.

Data requirements

No training data required — Haiku is pretrained. The system requires a flowing pipeline of UNCERTAIN events to triage, which is produced by Stage 1 once it exits shadow mode.

Cost model

Approximately $0.0003 per event at current Bedrock pricing. Sampled at 5–10% of UNCERTAIN events, typical monthly cost per customer is under one dollar at moderate deployment scale.

Advancement trigger

Stage 2 ships when:
  1. Stage 1 is live in production with stable UNCERTAIN-band output
  2. The cloud warm-path scoring consumer service is deployed
  3. A structured prompt template has been validated against a red-team event corpus

Stage 3: Supervised Tree Ensemble — Deferred

What it does

XGBoost gradient-boosted trees trained on labeled feature vectors. Replaces the hand-tuned thresholds in Stage 1’s Gate 3 corroboration logic with learned decision boundaries. Exported as ONNX and served via embedded runtime in the warm-path consumer at sub-millisecond inference latency.

Latency

Sub-millisecond ONNX inference on CPU. Stage 3 runs on the warm path with no new infrastructure — the pipeline writer loads the ONNX model at startup and checks for updates hourly.

Data requirements

Approximately 50,000 labeled events for initial training. Labels come from a combination of analyst feedback in the dashboard, Haiku soft labels from Stage 2, and controlled red-team exercises. Tree models tolerate noisy labels well, which is why Haiku-generated soft labels are viable as a large component of the training set.

Model choice rationale

Grinsztajn et al., NeurIPS 2022 benchmarked tree-based ensembles against deep learning across 45 tabular datasets. Tree models consistently outperformed neural networks below approximately 500,000 training samples with modest feature dimensionality. Security telemetry is tabular, moderate-dimensional, and security teams prize interpretability — tree models provide feature importances natively.

Advancement trigger

Stage 3 ships when:
  1. Stage 2 has accumulated at least 50,000 labeled events with balanced class distribution
  2. A feature export pipeline writes labeled feature vectors to S3 Parquet for reproducible training
  3. Offline cross-validation on held-out data shows recall ≥ 85% at precision ≥ 75% on the ANOMALOUS band

Stage 4: Graph Neural Networks — Deferred

What it does

Temporal graph networks such as TGN or edge-feature graph networks such as E-GraphSAGE trained on session-level event graphs. Detects structural attack patterns that per-event scoring cannot observe: reconnaissance-staging-exfiltration chains, coordinated subagent activity, lateral movement across resource graphs.

Latency

Inference takes 5–50 milliseconds per session graph for typical session sizes (hundreds of nodes, up to a thousand edges). Stage 4 runs on the warm path only — never inline. Session-level embeddings are cached and reused until the session grows materially.

Data requirements

Approximately 500,000 labeled sessions for reliable training. Published graph network work on network intrusion detection (E-GraphSAGE on CIC-IDS-2017 and UNSW-NB15) required 2.5 million labeled flows to outperform tree models by 2–3% F1. Security telemetry at agent-session granularity is comparable.

Why Quint will not train a GNN until the data supports it

Graph neural networks have several thousand learnable parameters for even modest architectures. Training on the few thousand events available today would produce a model that memorizes the training set and fails on unseen data. Published research (Sommer and Paxson, Outside the Closed World, IEEE S&P 2010) has documented the specific failure mode of security ML trained on synthetic or insufficient data. Rather than ship a model that does not generalize, Quint trains the closed-loop distillation path first: when stage 4 eventually trains, its output flow-matrix signatures will be pushed to the edge where Stage 1’s probabilistic machinery matches them in under a millisecond. The graph model’s role is to discover new attack shapes from large corpora; the edge’s role is to recognize them at reflex speed.

Advancement trigger

Stage 4 ships when:
  1. Stage 3 is live and producing scores on all events in production
  2. The labeled session corpus exceeds 500,000 sessions with confirmed analyst labels
  3. Per-session graph construction from the actions and feature_vectors tables is optimized and reproducible
  4. Offline training on held-out customer deployments shows structural detection capability that tree models cannot match

Reading this roadmap

Every trigger on this page is observable. None of them depend on calendar dates or fundraising milestones. Stage advancement happens when the data justifies it, not when a roadmap slide promises it. Skipping rungs produces detection systems that fail in production. Advancing rungs on schedule produces detection systems that improve monotonically with the deployed corpus. The ladder is deliberately slow by the standards of consumer AI. It is deliberately fast by the standards of enterprise security. The architecture is designed so that every stage ships something valuable in isolation — Stage 0 alone provides defensible detection, Stage 1 adds per-agent adaptation, Stage 2 adds semantic triage, Stage 3 adds learned calibration, Stage 4 adds structural pattern recognition. Each is sellable. Each is testable. Each makes the next one possible.