Porting scBench to inspect_evals: lessons from running AI agents on single-cell data
Lessons from moving scBench’s public task set into the inspect_evals framework, including scoring rules, evaluation setup, and what run logs revealed.
Lessons from moving scBench’s public task set into the inspect_evals framework, including scoring rules, evaluation setup, and what run logs revealed.