Research — Juan Manuel Contreras

AI safety and evaluation

Contreras, J. M. (2026). An LLM-native psychometric instrument does not predict LLM behavior: Evidence across 25 models. arXiv:2606.09843. Under review, NeurIPS 2026 (Evaluations & Datasets Track).
Contreras, J. M., & Carpenter, M. (2026). Adjudication in the age of machina legalis: A governance architecture for trustworthy and responsible AI in courts. Revise and resubmit, Cambridge Forum on AI: Law and Governance.
Contreras, J. M. (2025). Automated evaluation of gender bias across 13 large multimodal models. arXiv:2509.07050.
Contreras, J. M. (2025). Policy-grounded safety evaluation of 20 large language models. arXiv:2507.14719.
Morehouse, K. N.*, Contreras, J. M.*, Pan, W., & Banaji, M. R. (2024). Bias transmission in large language models: Evidence from gender-occupation bias in GPT-4. Next Generation of AI Safety Workshop, ICML 2024.

Leshinskaya, A., Contreras, J. M., Caramazza, A., & Mitchell, J. P. (2017). Neural representations of belief concepts: A representational similarity approach to social semantics. Cerebral Cortex, 27(1), 344–357.
Tamir, D. I., Thornton, M. A., Contreras, J. M., & Mitchell, J. P. (2016). Neural evidence that three dimensions organize mental state representation: Rationality, social impact, and valence. Proceedings of the National Academy of Sciences, 113(1), 194–199.
Contreras, J. M., Banaji, M. R., & Mitchell, J. P. (2013). Multivoxel patterns in fusiform face area differentiate faces by sex and race. PLOS ONE, 8(7), e69684.
Contreras, J. M., Schirmer, J., Banaji, M. R., & Mitchell, J. P. (2013). Common brain regions with distinct patterns of neural responses during mentalizing about groups and individuals. Journal of Cognitive Neuroscience, 25(9), 1406–1417.
Durante, F., Fiske, S. T., Kervyn, N., Cuddy, A. J. C., ... Contreras, J. M., ... Storari, C. C. (2013). Nations' income inequality predicts ambivalence in stereotype content: How societies mind the gap. British Journal of Social Psychology, 52(4), 726–746. (38 authors)
Contreras, J. M., Banaji, M. R., & Mitchell, J. P. (2012). Dissociable neural correlates of stereotypes and other forms of semantic knowledge. Social Cognitive and Affective Neuroscience, 7(7), 764–770.

Contreras, J. M. (2023). Data science needs you, social scientist. In Non-academic careers for quantitative social scientists: A practical guide to maximizing your skills and opportunities (pp. 9–13). Springer International Publishing.
Shephard, D., Contreras, J. M., Meuris, J., Kaat, A., Bailey, S., Custers, A., & Spencer, N. (2017). Beyond financial literacy: The psychological dimensions of financial capability (Technical Report). Think Forward Initiative.