Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.15119

Parallelized Autoregressive Visual Generation

Paper • 2412.15119 • Published 19 days ago • 49
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Paper • 2412.16112 • Published 18 days ago • 21
1.58-bit FLUX

Paper • 2412.18653 • Published 14 days ago • 68

AR Image Generation

Parallelized Autoregressive Visual Generation

Paper • 2412.15119 • Published 19 days ago • 49
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching

Paper • 2412.17153 • Published 16 days ago • 34
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

Paper • 2412.18609 • Published 14 days ago • 13

Image generation

Parallelized Autoregressive Visual Generation

Paper • 2412.15119 • Published 19 days ago • 49

Video Creation by Demonstration

Paper • 2412.09551 • Published 26 days ago • 8
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published 29 days ago • 46
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published 30 days ago • 71
APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 38

Paper - Multimodal

Paper related to Multimodal Model - Research for a : Modular, Multimodal, Multi-Stream, Mixture of Expert, Universal Transformer, Matryoshka embedding

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published 19 days ago • 25
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 23 days ago • 41
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 21 days ago • 120
Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published 20 days ago • 14

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published 26 days ago • 87
IamCreateAI/Ruyi-Mini-7B

Image-to-Video • Updated 14 days ago • 16.1k • 569
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Paper • 2412.06016 • Published about 1 month ago • 20
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 26 days ago • 85

Image Generation

Image Generation

Causal Diffusion Transformers for Generative Modeling

Paper • 2412.12095 • Published 22 days ago • 23
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Paper • 2412.09619 • Published 26 days ago • 20
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published 29 days ago • 46
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published 19 days ago • 25

XLabs-AI/flux-RealismLora

Text-to-Image • Updated Aug 22, 2024 • 241k • • 940
StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published 28 days ago • 19
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published 29 days ago • 46
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Paper • 2412.09501 • Published 26 days ago • 43

about 5 hours ago

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 33
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 26
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 121
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 21

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 14
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 6

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs