Posts

Integration of human transcriptome and cardiac systems models

Study of the impact of signaling pathway alterations in heart disease

ReconBench: benchmarking LLMs on cardiac signaling network reconstruction

An Inspect-native replication and extension of Tewari et al. (2025). Recall reproduces, precision and F1 are new, and the main failure traces to an extraction format mismatch rather than a reasoning gap.

Beyond sequence similarity: function-aware screening of synthesized DNA

Protein and DNA foundation models add a function-level screening layer that is robust to disguise. An ensemble of ESM-2 and Evo2 7B catches 85.8% of toxins at a 1-in-100 false-alarm rate, complementing, not replacing, the alignment-based policy screens used today.

Treasury Reapers: Sentient Arena Challenge 0 final report

Final report for my Treasury Reapers team submission to Sentient Arena Challenge 0: a goose-based OfficeQA agent, prompt and skill-contract iterations, leaderboard progression, and lessons from trace-driven optimization.

Porting scBench to inspect_evals: lessons from running AI agents on single-cell data

Lessons from moving scBench’s public task set into the inspect_evals framework, including scoring rules, evaluation setup, and what run logs revealed.