Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 22 days ago • 136
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Paper • 2208.10442 • Published Aug 22, 2022
RedStone: Curating General, Code, Math, and QA Data for Large Language Models Paper • 2412.03398 • Published Dec 4, 2024 • 1
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published 24 days ago • 41