Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark
UC Berkeley RDI and more than 300 experts launched Agents’ Last Exam, a benchmark meant to test whether AI agents can complete long, economically valuable professional workflows. O...





