TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper β’ 2412.14161 β’ Published 19 days ago β’ 49
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper β’ 2412.05237 β’ Published Dec 6, 2024 β’ 47
Evaluating Language Models as Synthetic Data Generators Paper β’ 2412.03679 β’ Published Dec 4, 2024 β’ 46
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper β’ 2411.04905 β’ Published Nov 7, 2024 β’ 113
PULSE-ECG Collection Teach Multimodal LLMs to Comprehend Electrocardiographic Images β’ 5 items β’ Updated Oct 28, 2024 β’ 3
Teach Multimodal LLMs to Comprehend Electrocardiographic Images Paper β’ 2410.19008 β’ Published Oct 21, 2024 β’ 23
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper β’ 2410.16153 β’ Published Oct 21, 2024 β’ 44
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper β’ 2410.13754 β’ Published Oct 17, 2024 β’ 75
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper β’ 2410.13824 β’ Published Oct 17, 2024 β’ 30
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper β’ 2410.10563 β’ Published Oct 14, 2024 β’ 38
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper β’ 2409.02813 β’ Published Sep 4, 2024 β’ 29
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures Paper β’ 2406.06565 β’ Published Jun 3, 2024 β’ 9
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper β’ 2405.15071 β’ Published May 23, 2024 β’ 37
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Paper β’ 2311.16502 β’ Published Nov 27, 2023 β’ 35
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning Paper β’ 2309.05653 β’ Published Sep 11, 2023 β’ 10