Charts
cost · tokens · runtime
Compare quality vs cost at a glance, plus token and runtime
breakdowns.
Leaderboard
overall · category rings
Rank models overall and by category, with cost, success, and coverage
alongside.
Examples
animated outputs
See one real prompt per category and how different models respond,
animated.
Methodology
benchmark · scoring
What’s in the benchmark, how scores are computed, and how to interpret
the numbers.