Collections
Discover the best community collections!
Collections including paper arxiv:2412.15119
-
Parallelized Autoregressive Visual Generation
Paper • 2412.15119 • Published • 49 -
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
Paper • 2412.17153 • Published • 34 -
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
Paper • 2412.18609 • Published • 13
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 8 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 46 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 71 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 25 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 41 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 120 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 87 -
IamCreateAI/Ruyi-Mini-7B
Image-to-Video • Updated • 16.1k • 569 -
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Paper • 2412.06016 • Published • 20 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 85
-
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 20 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 46 -
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 25
-
XLabs-AI/flux-RealismLora
Text-to-Image • Updated • 241k • • 940 -
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper • 2412.07744 • Published • 19 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 46 -
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Paper • 2412.09501 • Published • 43
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 33 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 26 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 121 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 21
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published • 1 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 14 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 6