Urdu Bench
Loading live results…

Cost vs Performance

Overall score (higher is better) vs total run cost (lower is better). Green quadrant is the target.

Token Use

Total generated tokens per model. When available, we split tokens into the final answer vs extra deliberation.

Model Time

The total time taken for the model to complete all requests.