About me

I am a Research Scientist at Scale AI. My work focuses on making LLM systems more reliable: evaluating where they succeed and fail, designing post-training data programs to address their failure modes, and studying how reward design shapes model behavior. A central question in my research is how to design robust and interpretable reward models that mitigate reward hacking in reinforcement learning.

Previously, I completed a PhD in Statistics at the University of Chicago with Victor Veitch, focusing on alignment and interpretability of large generative models. I also interned on ByteDance’s LLM Security team and was a student researcher at Google DeepMind with Sanmi Koyejo. Before graduate school, I earned a B.S. in Computational and Applied Mathematics at UChicago.

Zihao Wang