VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation Paper • 2412.21059 • Published 7 days ago • 11
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 3 days ago • 12
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 3 days ago • 19
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Paper • 2501.01895 • Published 3 days ago • 41
Dolphin 3.0 Collection Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model. • 7 items • Updated 1 day ago • 28
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models Paper • 2501.00316 • Published 7 days ago • 21
MapQaTor: A System for Efficient Annotation of Map Query Datasets Paper • 2412.21015 • Published 7 days ago • 8
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 6 days ago • 38
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 4 days ago • 41
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 4 days ago • 42
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 5 days ago • 82
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published 7 days ago • 22
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published 4 days ago • 23
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 10 days ago • 70
On the Compositional Generalization of Multimodal LLMs for Medical Imaging Paper • 2412.20070 • Published 10 days ago • 40
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 13 days ago • 63
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 12 days ago • 86