FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published 21 days ago • 13
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers Paper • 2412.12571 • Published 22 days ago • 8
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 20 days ago • 23
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Paper • 2412.14123 • Published 20 days ago • 11
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Paper • 2412.12953 • Published 22 days ago • 11
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published 21 days ago • 18
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 20 days ago • 49
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 21 days ago • 120
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 23 days ago • 41
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 26 days ago • 85
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 22 days ago • 23
Smaller Language Models Are Better Instruction Evolvers Paper • 2412.11231 • Published 23 days ago • 27
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published 28 days ago • 35
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published 22 days ago • 33
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Paper • 2412.06673 • Published 29 days ago • 11