Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Paper • 2501.01423 • Published 5 days ago • 33
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 7 days ago • 39
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 11 days ago • 73
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published 15 days ago • 34
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 15 days ago • 37
Outcome-Refining Process Supervision for Code Generation Paper • 2412.15118 • Published 19 days ago • 19
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 16 days ago • 44
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Paper • 2412.16112 • Published 18 days ago • 21
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 23 days ago • 41
ColorFlow: Retrieval-Augmented Image Sequence Colorization Paper • 2412.11815 • Published 23 days ago • 26
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 25 days ago • 136
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Paper • 2412.09619 • Published 26 days ago • 20
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper • 2412.07760 • Published 28 days ago • 50
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation Paper • 2412.04445 • Published Dec 5, 2024 • 21
Monet: Mixture of Monosemantic Experts for Transformers Paper • 2412.04139 • Published Dec 5, 2024 • 11
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion Paper • 2412.04462 • Published Dec 5, 2024 • 7
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation Paper • 2412.04448 • Published Dec 5, 2024 • 9
One Shot, One Talk: Whole-body Talking Avatar from a Single Image Paper • 2412.01106 • Published Dec 2, 2024 • 18