RewardBench: Evaluating Reward Models for Language Modeling Paper • 2403.13787 • Published Mar 20, 2024 • 21
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning Paper • 2402.14809 • Published Feb 22, 2024 • 3