Humanity's Last Exam

HLE benchmarks frontier LLMs on expert-level questions spanning science, math, and humanities—the hardest known AI evaluation dataset.

scale.com