Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper β’ 2412.13663 β’ Published 18 days ago β’ 116
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling β’ 3 items β’ Updated 16 days ago β’ 112
The Open Source Advantage in Large Language Models (LLMs) Paper β’ 2412.12004 β’ Published 19 days ago β’ 9
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper β’ 2412.09604 β’ Published 23 days ago β’ 35
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper β’ 2412.10360 β’ Published 22 days ago β’ 136
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper β’ 2412.08737 β’ Published 24 days ago β’ 52
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper β’ 2412.09596 β’ Published 23 days ago β’ 92
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper β’ 2412.08443 β’ Published 24 days ago β’ 38
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper β’ 2412.08580 β’ Published 24 days ago β’ 45
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper β’ 2412.07760 β’ Published 25 days ago β’ 50
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper β’ 2412.07589 β’ Published 25 days ago β’ 46
Evaluating and Aligning CodeLLMs on Human Preference Paper β’ 2412.05210 β’ Published 29 days ago β’ 47
STIV: Scalable Text and Image Conditioned Video Generation Paper β’ 2412.07730 β’ Published 25 days ago β’ 70
Training Large Language Models to Reason in a Continuous Latent Space Paper β’ 2412.06769 β’ Published 26 days ago β’ 66
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper β’ 2412.06559 β’ Published 26 days ago β’ 72
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Paper β’ 2412.06531 β’ Published 26 days ago β’ 71
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper β’ 2412.05237 β’ Published 29 days ago β’ 46
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper β’ 2412.04862 β’ Published 30 days ago β’ 49