Online Rubrics Elicitation from Pairwise Comparisons

Published in Proceedings of the 43rd International Conference on Machine Learning (ICML), 2026

Recommended citation: MohammadHossein Rezaei, Robert Vacareanu, Zihao Wang, Clinton Wang, Bing Liu, Yunzhong He, and Afra Feyza Akyürek. "Online Rubrics Elicitation from Pairwise Comparisons." Proceedings of the 43rd International Conference on Machine Learning (ICML), 2026. https://icml.cc/virtual/2026/poster/65623

Rubrics provide a flexible way to train LLMs on open-ended long-form answers where verifiable rewards are not available and human preferences provide only coarse signals. Static rubrics can become vulnerable to reward hacking and fail to capture new desiderata that emerge during training. We introduce Online Rubrics Elicitation, a method that dynamically curates evaluation criteria through pairwise comparisons between responses from current and reference policies. This online process continuously identifies and mitigates errors as training proceeds, yielding consistent improvements over training exclusively with static rubrics.