Where ARYA Wins, Where It Loses: Q1 2026 Benchmark Margin vs Best Available Baseline

Margin in points over the best available baseline (best probabilistic model or, for WorldArena, ARYA v1). FrontierMath (AIME 2024) is the one place ARYA underperforms the field. Two further benchmarks (CausalBench 85.6% and AI Safety Index 100% A) have no comparable competitor baseline and are excluded from this chart.