Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published 8 days ago • 17
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 20 days ago • 49
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published Dec 6, 2024 • 47
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 46
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs Paper • 2411.14199 • Published Nov 21, 2024 • 30
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation Paper • 2410.17250 • Published Oct 22, 2024 • 14
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21, 2024 • 44
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Paper • 2410.14669 • Published Oct 18, 2024 • 36
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published Oct 17, 2024 • 30
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published Sep 4, 2024 • 29
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 69
Advancing LLM Reasoning Generalists with Preference Trees Paper • 2404.02078 • Published Apr 2, 2024 • 44
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales Paper • 2405.20974 • Published May 31, 2024
A Single Transformer for Scalable Vision-Language Modeling Paper • 2407.06438 • Published Jul 8, 2024 • 1
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 69
VIMI: Grounding Video Generation through Multi-modal Instruction Paper • 2407.06304 • Published Jul 8, 2024 • 10
Training Task Experts through Retrieval Based Distillation Paper • 2407.05463 • Published Jul 7, 2024 • 8