I’m a cognitive scientist working on the evaluation and alignment of large language models as the director of Modulo Research. We recently released a dataset of expert-annotated valid and invalid solutions involving long-form reasoning intended to facilitate scalable oversight research (preprint here). We’re now finalizing a dataset of textual representations of the research processes followed by high-performing participants in an experiment involving an online research task — for use in improving LLM capability elicitations — and writing up the results of our associated experiments. You can sign up to be notified when we release future datasets.
I’m also grateful to have had the opportunity to contribute to Usman et al.’s monumental agenda paper, Foundational Challenges in Assuring Alignment and Safety of Large Language Models, and to contribute to some of Anthropic’s Frontier Red Team evaluation/demo projects as part of collaborations with Hidden Variable Limited.
See my Google Scholar profile for a list of my most cited works, and the bottom of this page for recent updates that may not be reflected there.
Recognition
- My sole-authored preprint “Teaching autoregressive language models complex tasks by demonstration” has been cited by papers out of Google Brain and DeepMind and was discussed on Machine Learning Street Talk
- One of four winners of the AI Impacts essay competition on the Automation of Wisdom and Philosophy (out of 90 entries)
- Third Prize recipient in the Inverse Scaling Prize competition, which focused on identifying tasks where larger language models exhibit decreased performance
- Co-authored “Risk perceptions of COVID-19 around the world“, referenced by U.S. News, The Telegraph, The Daily Mail, BBC Future and 130 other outlets
“Are you the same Gabriel Recchia who…?”
In a former life, I did things like:
- leading on user testing research/evaluation of patient-friendly genetic reports and the widely used prognostic tool Predict: Breast Cancer at the University of Cambridge’s Winton Centre for Risk and Evidence Communication
- investigating capabilities, properties, and applications of distributional models trained on lots of text
- conducted various studies of human semantic memory and how risk is communicated, perceived, and predicted
- writing an alphabet book about exoplanets (sadly uncalibrated to the reading level of any child young enough to still be interested in alphabet books)
Recent papers, preprints, and work in progress
-
- Recchia, G., Mangat, C., Nyachhyon, J., Sharma, M., Canavan, C., Epstein-Gross, D., and Abdulbari, M. (in prep.) Automation bias: A challenge for scalable oversight. Presents results of two sandwiching-like experiments intended to establish baselines for simple approaches.
- Recchia, G., Mangat, C. S., Li, I., & Krishnakumar, G. (2025). FindTheFlaws: Annotated errors for use in scalable oversight research. Link
- Phan, L., Gatti, A., Han, Z., Li, N., Hu, J., Zhang, H., … & Verbeken, B. (2025). Humanity’s Last Exam. Link. Co-author on account of contributing question(s) that were selected for the dataset.
- Anwar, U., Saparov, A., Rando, J., Paleka, D., Turpin, M., Hase, P., … & Krueger, D. (2024). Foundational challenges in assuring alignment and safety of large language models. Transactions on Machine Learning Research, 2835-8856. Link
- McKenzie, I. R., Lyzhov, A., Pieler, M., Parrish, A., Mueller, A., Prabhu, A., … & Perez, E. (2023). Inverse scaling: When bigger isn’t better. Transactions on Machine Learning Research. Link Co-author on account of submitting a winning task (e.g., identifying a task on which language model performance decreases with scale).
- Proto, R., Recchia, G., Dryhurst, S., Freeman, A.L. (2023). Do colored cells in risk matrices affect decision‐making and risk perception? Insights from randomized controlled studies. Risk Analysis. Link
- Recchia, G., Lawrence A. C. E., Capacchione, L., & Freeman, A.L.J. (2022). Making BRCA1 genetic test reports easier to understand through user-centered design: A randomized trial. Genetics in Medicine. Link
- Recchia, G. (2021). Teaching autoregressive language models complex tasks by demonstration. Link. Early preprint demonstrating an example of capability elicitation via fine-tuning. Cited by papers out of DeepMind and Google Research.
-
More at Google Scholar