Humanity's Last Exam
HLE benchmarks frontier LLMs on expert-level questions spanning science, math, and humanities—the hardest known AI evaluation dataset.