shawon
's Collections
L&V Models
updated
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
•
2402.17177
•
Published
•
88
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Paper
•
2403.13248
•
Published
•
78
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
•
2311.05437
•
Published
•
48
UniAff: A Unified Representation of Affordances for Tool Usage and
Articulation with Vision-Language Models
Paper
•
2409.20551
•
Published
•
14
Visual Question Decomposition on Multimodal Large Language Models
Paper
•
2409.19339
•
Published
•
8
Image Copy Detection for Diffusion Models
Paper
•
2409.19952
•
Published
•
13
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper
•
2312.07537
•
Published
•
25