Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study

Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study



  • OpenAI has released GDPval, a new evaluation system to test how AI performs at work-related tasks
  • Claude Opus 4.1 comes out in the lead, with ‘ChatGPT-5 high’ in second place
  • Tasks include things like emailing a response to a dissatisfied customer

We’re all familiar with AI benchmarks, which measure performance at certain tasks, but often these tasks don’t reflect the real world and how people actually use AI, especially at work.

To combat this problem, OpenAI, the maker of ChatGPT, is introducing GDPval, a new way of measuring AI model performance using real-world work tasks compared to a real human across 44 occupations, from software developers and lawyers to registered nurses and mechanical engineers.



Source: Techradar

Leave a Reply

Your email address will not be published. Required fields are marked *