Swarm Audit · Real-Corpus Leaderboard

Precision, recall, and F1 on the 205-PR benchmarks/real-corpus/scores/latest.json snapshot. The synthetic 500/500 regression suite is in the lower panel as a self-consistency check. Reproduce locally with npm run corpus:score-real.

Real-corpus headline

Loading…

Overall precision
Overall recall
Overall F1

Per-detector (click a column to sort)

Detector TP FP TN FN Precision Recall F1

Labels in the current snapshot are AI-judged (claude-opus-4-7-baseline-judge), pending human re-label under labels-v2. Promotion to gate-eligible requires human-labeled F1 ≥ 0.5 in promotions.json.

Synthetic regression sidebar

Loading…

Per-agent

Agent Cases Caught Catch rate

Per-category

Category Cases Caught Catch rate

The 1.000 catch rate on the synthetic corpus is a self-consistency check, not detection power: the generator and detectors share the same regex vocabulary.