Quint’s detection architecture is a five-stage ladder. Each stage adds a capability the previous stage lacks, and each stage depends on data the previous stage produced. Skipping rungs produces detection systems that either fire on everything (no calibration) or miss real attacks (no structured feedback loop).The ladder is standard in security machine learning. CrowdStrike did not ship indicators of attack on day one — they shipped behavioral rules and built the ML layer on top of the resulting telemetry. SentinelOne’s Static AI for binary classification came after their behavioral agent had collected sufficient labeled data. Abnormal Security’s behavioral models for email security were seeded by a simpler heuristic scoring layer that ran for months before any neural network trained on the output.Quint follows the same sequence deliberately.
Probabilistic per-agent baseline. Catches novel behavior, frequency spikes, capability drift. Requires a few hundred actions per agent to calibrate.
Stage 2: LLM Triage
Claude Haiku classifies ambiguous events. Produces soft labels at ~$0.0003 per event. Requires no data volume; bootstraps label generation for downstream stages.
Stage 3: Supervised
Gradient-boosted trees (XGBoost) trained on labeled feature vectors. Learns decision boundaries that hand-tuned thresholds approximate. Requires ~50K labeled events.
Some actions are always dangerous regardless of context. Reading /etc/shadow, running rm -rf /, accessing credential stores — these patterns do not require behavioral context to flag. Stage 0 catches them in microseconds with pure hash lookups and regex matching.Rules are exact, fast, and interpretable. They are also brittle: they only catch what you wrote a rule for, and attackers do not use patterns from your deny list. Stage 0 is the foundation the rest of the ladder is built on, not the entire system.
Most attacks look like legitimate agent behavior in any individual event. The signal is that the behavior is unusual for this specific agent. A Claude Code session reading credential files is suspicious if the agent has never touched credential files before; it is normal if the agent works on DevOps infrastructure daily.Stage 1 tracks what each agent has done historically using probabilistic data structures — Bloom filters, Count-Min Sketches, HyperLogLog, Markov chains, and capability distributions. Every new action is scored against the fingerprint in under a millisecond on the edge. Actions that fall within the agent’s established envelope pass silently (about 95% of traffic). Actions that deviate produce signals that feed the next stage.See Probabilistic Data Structures for the details.
When stage 1 signals are ambiguous — a novel tool but within normal frequency, a slight capability drift but stable timing — the system does not have enough information to confidently label the event. Stage 2 asks Claude Haiku.Haiku receives a structured prompt containing the event context, fired signals, and recent session history. It returns a classification (benign / suspicious / malicious) with a confidence score and a natural-language explanation. At current Bedrock pricing, a sampled fraction of ambiguous events costs a few cents per customer per day, making this economically viable as a continuous background process.The Haiku verdicts become soft labels. They are not ground truth — they are LLM opinions. But at scale, soft labels are the raw material that bootstraps supervised learning in stage 3.
Once labels have accumulated — a mixture of analyst confirmations, Haiku verdicts, and red-team synthesis — stage 3 trains a gradient-boosted tree ensemble on the feature vectors produced by stage 2. XGBoost learns decision boundaries that replace the hand-tuned thresholds in stage 1’s corroboration gate.The choice of XGBoost over deep learning at this stage is principled. Grinsztajn et al., NeurIPS 2022 benchmarked tree-based models against neural networks across 45 tabular datasets and found trees outperformed deep learning until datasets exceeded several hundred thousand samples with specific structural properties. Security telemetry is tabular data at modest scale. Tree models are the correct first trained model.
Stage 4 is the endpoint of the ladder. Per-event scoring, no matter how sophisticated, misses structural patterns that only become visible when an entire session is considered as a graph. An exfiltration attack that reads one sensitive file, writes to a staging location, and makes one outbound connection produces three individually unremarkable events. Viewed as a graph — with the session as the root and nodes for processes, files, and network destinations — the shape is diagnostic.Temporal graph networks like TGN and edge-feature variants like E-GraphSAGE learn to score such graphs. Published work shows these models outperform tree-based approaches by single-digit percentage points on network intrusion detection benchmarks, but the performance gap only emerges with millions of labeled flows. Below that threshold, trees win.Quint defers stage 4 until the labeled session corpus crosses the threshold at which graph models earn their performance gap. Shipping a graph neural network on insufficient data produces a model that overfits the training set and fails in production — worse than shipping nothing.
The ladder is not a one-way pipeline. Stage 4 produces something the earlier stages can consume: distilled signatures. When a trained graph model identifies a recurring attack subgraph, the structural pattern is extracted and compressed into a flow-matrix signature that stage 1 can match in under a millisecond.This closes the loop. The slow, expensive graph training happens in the cloud on a nightly or weekly cycle. The resulting signatures are pushed to every edge daemon. On the edge, pattern matching runs at hot-path latency, catching attacks the graph model learned about but using stage 1’s fast probabilistic machinery.The overall system behaves like a learning immune system. New attack shapes discovered in the cloud become reflex-speed signatures at the edge. Stage 4 is the brain. Stage 1 is the nervous system. Each stage’s output becomes the next stage’s input, and the pipeline as a whole improves monotonically as the data accumulates.
The current deployment runs stages 0 and 1 on the edge. Stage 0 is live in production. Stage 1 is scaffolded and executes in shadow mode — the full gate pipeline computes signals on every event, but the production enforcement decision is deterministic pending validation of the behavioral baselines.Stages 2, 3, and 4 are the deliberate roadmap. Each unlocks when the data produced by the prior stages justifies its training requirements. See ML Roadmap for stage-by-stage status and the observable triggers that advance each stage to production.