SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published 23 days ago • 16
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 48
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models Paper • 2406.16714 • Published Jun 24, 2024 • 10
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training Paper • 2311.04155 • Published Nov 7, 2023 • 1