Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training

Published in International Conference on Learning Representations (ICLR), 2026

Recommended citation: Junkai Zhang^*, Zihao Wang^*, Lin Gui^*, Swarnashree Mysore Sathyendra, Jaehwan Jeong, Victor Veitch, Wei Wang, Yunzhong He, Bing Liu, and Lifeng Jin. "Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training." International Conference on Learning Representations (ICLR), 2026. ^*Equal contribution. https://openreview.net/forum?id=pBjy4ek2QV

Reinforcement fine-tuning often suffers from reward over-optimization, where a policy model learns to hack reward signals and receive high scores while producing lower-quality outputs. We show that a key issue is reward misspecification in the high-reward tail: the reward model struggles to distinguish excellent responses from merely great ones. To address this, we study rubric-based rewards, which can leverage off-policy examples while remaining less sensitive to their artifacts. We introduce a workflow for eliciting rubrics that capture distinctions among strong and diverse responses, and empirically show that rubric-based rewards substantially mitigate reward over-optimization and improve LLM post-training.

Share on

Twitter Facebook LinkedIn