Demystifying evals for AI agents

Demystifying evals for AI agents

www.anthropic.com