SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published 23 days ago • 16
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Paper • 2411.02337 • Published Nov 4, 2024 • 35