LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks Paper • 2410.01744 • Published Oct 2, 2024 • 26
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 188
Floating No More: Object-Ground Reconstruction from a Single Image Paper • 2407.18914 • Published Jul 26, 2024 • 20