Documentation Index
Fetch the complete documentation index at: https://quintsecurity.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
ML Roadmap
Quint’s detection architecture follows a five-stage ladder. Each stage has objective triggers that determine when it advances to production. This page is the authoritative status of each stage and the observable conditions required for advancement.Stage 0: Deterministic Rules — Live
What it does
Pattern matching, regex scanning, and threshold logic over tool names, arguments, and behavioral counts. Catches explicit deny-list matches, known-dangerous command patterns, canary token access, and brute-force velocity violations.Latency
Sub-microsecond per evaluation. Pure in-memory lookups with zero allocations on the fast path.Data requirements
None. Rules work from day one of any deployment.Known limitations
- Only catches patterns that have been hand-coded into rules
- Does not adapt to per-agent behavioral baselines
- High false-negative rate on novel attack patterns
How it advances
Stage 0 is permanent foundation, not a stage to advance past. Every subsequent stage runs alongside the rule engine, not in place of it. Rules provide the interpretable deterministic floor that makes the probabilistic stages safe to enable.Stage 1: Probabilistic Fingerprints — Shadow Mode
What it does
Per-agent behavioral baseline using probabilistic data structures. Tracks tool usage history, frequency distributions, capability shape, and inter-action timing. Produces six deviation signals per event, corroborated through a four-gate pipeline into KNOWN_SAFE, UNCERTAIN, or ANOMALOUS band classifications.Latency
Under one millisecond per evaluation on commodity hardware. The common-case path (95% of actions exit at Gate 1) completes in approximately 300 nanoseconds.Data requirements
Approximately 500 actions per agent for baseline stabilization. In practical terms, a few hours to a few days of normal agent operation, depending on the agent’s activity level.Current status
The full gate pipeline is implemented and tested. Production deployment runs the pipeline in shadow mode: all signals compute, all bands are determined, but enforcement decisions are made by Stage 0 alone. The shadow mode output is logged and compared against expected baselines while calibration data accumulates.Advancement trigger
Stage 1 advances to production enforcement when:- Shadow mode has run on production traffic for at least 7 days
- The KNOWN_SAFE exit rate on calibration data is within 1 percentage point of the expected 95%
- The ANOMALOUS rate on calibration data is below 1% (otherwise the corroboration gate requires re-tuning)
- No correctness regressions surface in the shadow-vs-rule agreement tests
Stage 2: LLM Triage — Planned
What it does
Claude Haiku classifies UNCERTAIN-band events from Stage 1 through a structured prompt containing the event context and fired signals. Returns benign/suspicious/malicious verdicts with confidence scores and natural-language rationales. Verdicts become soft labels that feed Stage 3’s eventual training corpus.Latency
200–500 milliseconds per Haiku invocation. Stage 2 runs asynchronously on the warm path — events receive their Stage 1 band immediately and the Haiku verdict enriches the event record when it returns.Data requirements
No training data required — Haiku is pretrained. The system requires a flowing pipeline of UNCERTAIN events to triage, which is produced by Stage 1 once it exits shadow mode.Cost model
Approximately $0.0003 per event at current Bedrock pricing. Sampled at 5–10% of UNCERTAIN events, typical monthly cost per customer is under one dollar at moderate deployment scale.Advancement trigger
Stage 2 ships when:- Stage 1 is live in production with stable UNCERTAIN-band output
- The cloud warm-path scoring consumer service is deployed
- A structured prompt template has been validated against a red-team event corpus
Stage 3: Supervised Tree Ensemble — Deferred
What it does
XGBoost gradient-boosted trees trained on labeled feature vectors. Replaces the hand-tuned thresholds in Stage 1’s Gate 3 corroboration logic with learned decision boundaries. Exported as ONNX and served via embedded runtime in the warm-path consumer at sub-millisecond inference latency.Latency
Sub-millisecond ONNX inference on CPU. Stage 3 runs on the warm path with no new infrastructure — the pipeline writer loads the ONNX model at startup and checks for updates hourly.Data requirements
Approximately 50,000 labeled events for initial training. Labels come from a combination of analyst feedback in the dashboard, Haiku soft labels from Stage 2, and controlled red-team exercises. Tree models tolerate noisy labels well, which is why Haiku-generated soft labels are viable as a large component of the training set.Model choice rationale
Grinsztajn et al., NeurIPS 2022 benchmarked tree-based ensembles against deep learning across 45 tabular datasets. Tree models consistently outperformed neural networks below approximately 500,000 training samples with modest feature dimensionality. Security telemetry is tabular, moderate-dimensional, and security teams prize interpretability — tree models provide feature importances natively.Advancement trigger
Stage 3 ships when:- Stage 2 has accumulated at least 50,000 labeled events with balanced class distribution
- A feature export pipeline writes labeled feature vectors to S3 Parquet for reproducible training
- Offline cross-validation on held-out data shows recall ≥ 85% at precision ≥ 75% on the ANOMALOUS band
Stage 4: Graph Neural Networks — Deferred
What it does
Temporal graph networks such as TGN or edge-feature graph networks such as E-GraphSAGE trained on session-level event graphs. Detects structural attack patterns that per-event scoring cannot observe: reconnaissance-staging-exfiltration chains, coordinated subagent activity, lateral movement across resource graphs.Latency
Inference takes 5–50 milliseconds per session graph for typical session sizes (hundreds of nodes, up to a thousand edges). Stage 4 runs on the warm path only — never inline. Session-level embeddings are cached and reused until the session grows materially.Data requirements
Approximately 500,000 labeled sessions for reliable training. Published graph network work on network intrusion detection (E-GraphSAGE on CIC-IDS-2017 and UNSW-NB15) required 2.5 million labeled flows to outperform tree models by 2–3% F1. Security telemetry at agent-session granularity is comparable.Why Quint will not train a GNN until the data supports it
Graph neural networks have several thousand learnable parameters for even modest architectures. Training on the few thousand events available today would produce a model that memorizes the training set and fails on unseen data. Published research (Sommer and Paxson, Outside the Closed World, IEEE S&P 2010) has documented the specific failure mode of security ML trained on synthetic or insufficient data. Rather than ship a model that does not generalize, Quint trains the closed-loop distillation path first: when stage 4 eventually trains, its output flow-matrix signatures will be pushed to the edge where Stage 1’s probabilistic machinery matches them in under a millisecond. The graph model’s role is to discover new attack shapes from large corpora; the edge’s role is to recognize them at reflex speed.Advancement trigger
Stage 4 ships when:- Stage 3 is live and producing scores on all events in production
- The labeled session corpus exceeds 500,000 sessions with confirmed analyst labels
- Per-session graph construction from the
actionsandfeature_vectorstables is optimized and reproducible - Offline training on held-out customer deployments shows structural detection capability that tree models cannot match