Examples
One prompt per category with best, median, and worst responses for that prompt. The score shown is for this single prompt (not the overall leaderboard).
Representative prompts with animated model outputs by category.
One prompt per category with best, median, and worst responses for that prompt. The score shown is for this single prompt (not the overall leaderboard).