Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation Paper • 2501.03059 • Published 1 day ago • 12
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 7 days ago • 39
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation Paper • 2412.21059 • Published 8 days ago • 15
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 4 days ago • 22
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Paper • 2501.01423 • Published 5 days ago • 33
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published 8 days ago • 22
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published 6 days ago • 23
Slow Perception: Let's Perceive Geometric Figures Step-by-step Paper • 2412.20631 • Published 9 days ago • 12
PERSE: Personalized 3D Generative Avatars from A Single Portrait Paper • 2412.21206 • Published 8 days ago • 15
MMFactory: A Universal Solution Search Engine for Vision-Language Tasks Paper • 2412.18072 • Published 15 days ago • 14
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Paper • 2412.15797 • Published 19 days ago • 16
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published 15 days ago • 34