Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 14 days ago • 28
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 121
Mimir: Improving Video Diffusion Models for Precise Text Understanding Paper • 2412.03085 • Published Dec 4, 2024 • 12
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published Nov 26, 2024 • 77
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper • 2411.13503 • Published Nov 20, 2024 • 30
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Paper • 2411.11922 • Published Nov 18, 2024 • 18
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22, 2024 • 25
Aurora Series: AuroraCap Collection Efficient, Performant Video Detailed Captioning and a New Benchmark • 8 items • Updated Oct 26, 2024 • 3
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4, 2024 • 5
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2, 2024 • 41
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published Sep 30, 2024 • 55
view article Article Llama can now see and run on your device - welcome Llama 3.2 Sep 25, 2024 • 180
LLaVA-Onevision Collection LLaVa_Onevision models for single-image, multi-image, and video scenarios • 9 items • Updated Sep 18, 2024 • 12
Prithvi WxC: Foundation Model for Weather and Climate Paper • 2409.13598 • Published Sep 20, 2024 • 40
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets Paper • 2406.13897 • Published May 30, 2024 • 12
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19, 2024 • 48
See and Think: Embodied Agent in Virtual Environment Paper • 2311.15209 • Published Nov 26, 2023 • 2