Tasks

The 7 things businesses actually ask LLMs to do.

Every model is scored on these categories. Open one to see the prompts, the judges, and the leaderboard for that task.

84 prompts

Code generation

142 models tested · updated daily

Sample prompt

Write a TypeScript REST endpoint that validates a JWT and returns user data.

Top: Claude Sonnet 4Open

52 prompts

Reasoning & logic

142 models tested · updated daily

Sample prompt

Solve this multi-step math problem with a step-by-step explanation…

Top: o3Open

38 prompts

Email writing

142 models tested · updated daily

Sample prompt

Reply to this support ticket professionally and suggest a resolution.

Top: GPT-4oOpen

47 prompts

Customer support

142 models tested · updated daily

Sample prompt

A user is angry about a missed delivery — write a response that de-escalates.

Top: Claude Sonnet 4Open

31 prompts

Legal review

142 models tested · updated daily

Sample prompt

Identify the top 3 risks in this MSA indemnification clause.

Top: GPT-4oOpen

29 prompts

Financial analysis

142 models tested · updated daily

Sample prompt

Compute the IRR for this cash-flow series and summarise the investment thesis.

Top: Claude Sonnet 4Open

35 prompts

Summarisation

142 models tested · updated daily

Sample prompt

Summarise this 40-page meeting transcript in 5 executive bullets.

Top: Gemini 1.5 ProOpen

Upload your own prompts. We'll benchmark them across 50+ models and give you a private leaderboard.