Swarm Audit · Real-Corpus Leaderboard

Real-corpus headline

Loading…

Overall precision

Overall recall

Overall F1

Per-detector (click a column to sort)

Detector	TP	FP	TN	FN	Precision	Recall	F1

Labels in the current snapshot are AI-judged (claude-opus-4-7-baseline-judge), pending human re-label under labels-v2. Promotion to gate-eligible requires human-labeled F1 ≥ 0.5 in promotions.json.

Synthetic regression sidebar

Loading…

Per-agent

Agent	Cases	Caught	Catch rate

Per-category

Category	Cases	Caught	Catch rate

The 1.000 catch rate on the synthetic corpus is a self-consistency check, not detection power: the generator and detectors share the same regex vocabulary.