Anthropic describes agent evaluations as multi-turn, tool-using, state-modifying trials that require tasks, graders, traces, outcomes, harnesses, and evaluation suites.
Anthropic describes agent evaluations as multi-turn, tool-using, state-modifying trials that require tasks, graders, traces, outcomes, harnesses, and evaluation suites
