WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Paper • 2401.09985 • Published Jan 18, 2024 • 15
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects Paper • 2401.09962 • Published Jan 18, 2024 • 8
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation Paper • 2401.10822 • Published Jan 19, 2024 • 13
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23, 2024 • 86
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1, 2024 • 22
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20, 2024 • 23
Video ReCap: Recursive Captioning of Hour-Long Videos Paper • 2402.13250 • Published Feb 20, 2024 • 25
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Paper • 2402.14797 • Published Feb 22, 2024 • 20
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27, 2024 • 88
Sora Generates Videos with Stunning Geometrical Consistency Paper • 2402.17403 • Published Feb 27, 2024 • 16
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners Paper • 2402.17723 • Published Feb 27, 2024 • 16
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29, 2024 • 32
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Paper • 2403.03100 • Published Mar 5, 2024 • 34
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation Paper • 2403.02827 • Published Mar 5, 2024 • 6
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14, 2024 • 13
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations Paper • 2108.01073 • Published Aug 2, 2021 • 7
AnimateDiff-Lightning: Cross-Model Diffusion Distillation Paper • 2403.12706 • Published Mar 19, 2024 • 17
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15, 2024 • 11
MotionMaster: Training-free Camera Motion Transfer For Video Generation Paper • 2404.15789 • Published Apr 24, 2024 • 10
LLM-AD: Large Language Model based Audio Description System Paper • 2405.00983 • Published May 2, 2024 • 17
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19, 2024 • 54
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published May 22, 2024 • 23
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published May 23, 2024 • 11
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition Paper • 2405.15216 • Published May 24, 2024 • 12
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models Paper • 2405.16537 • Published May 26, 2024 • 16
Looking Backward: Streaming Video-to-Video Translation with Feature Banks Paper • 2405.15757 • Published May 24, 2024 • 14
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer Paper • 2405.17405 • Published May 27, 2024 • 14
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published May 27, 2024 • 10
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published May 28, 2024 • 20
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29, 2024 • 21
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Paper • 2405.18991 • Published May 29, 2024 • 12
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Paper • 2405.20222 • Published May 30, 2024 • 11
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark Paper • 2405.19707 • Published May 30, 2024 • 7
Learning Temporally Consistent Video Depth from Video Diffusion Priors Paper • 2406.01493 • Published Jun 3, 2024 • 19
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation Paper • 2406.00908 • Published Jun 3, 2024 • 11
Searching Priors Makes Text-to-Video Synthesis Better Paper • 2406.03215 • Published Jun 5, 2024 • 11
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6, 2024 • 73
VideoTetris: Towards Compositional Text-to-Video Generation Paper • 2406.04277 • Published Jun 6, 2024 • 24
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published Jun 8, 2024 • 39
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing Paper • 2406.06523 • Published Jun 10, 2024 • 50
Hierarchical Patch Diffusion Models for High-Resolution Video Generation Paper • 2406.07792 • Published Jun 12, 2024 • 13
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation Paper • 2406.07686 • Published Jun 11, 2024 • 14
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation Paper • 2406.08656 • Published Jun 12, 2024 • 7
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality Paper • 2406.08845 • Published Jun 13, 2024 • 9
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning Paper • 2406.14130 • Published Jun 20, 2024 • 10
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Paper • 2406.15252 • Published Jun 21, 2024 • 14
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models Paper • 2407.01519 • Published Jul 1, 2024 • 22
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix Paper • 2407.00367 • Published Jun 29, 2024 • 9
VIMI: Grounding Video Generation through Multi-modal Instruction Paper • 2407.06304 • Published Jul 8, 2024 • 10
VEnhancer: Generative Space-Time Enhancement for Video Generation Paper • 2407.07667 • Published Jul 10, 2024 • 14
Still-Moving: Customized Video Generation without Customized Video Data Paper • 2407.08674 • Published Jul 11, 2024 • 12
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation Paper • 2407.06188 • Published Jul 8, 2024 • 1
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models Paper • 2407.09012 • Published Jul 12, 2024 • 9
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models Paper • 2407.10285 • Published Jul 14, 2024 • 4
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Paper • 2407.12781 • Published Jul 17, 2024 • 13
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Paper • 2407.13759 • Published Jul 18, 2024 • 17
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models Paper • 2407.15642 • Published Jul 22, 2024 • 11
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence Paper • 2407.16655 • Published Jul 23, 2024 • 30
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation Paper • 2407.14505 • Published Jul 19, 2024 • 26
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention Paper • 2407.19918 • Published Jul 29, 2024 • 49
Tora: Trajectory-oriented Diffusion Transformer for Video Generation Paper • 2407.21705 • Published Jul 31, 2024 • 27
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion Paper • 2408.00458 • Published Aug 1, 2024 • 11
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model Paper • 2408.00762 • Published Aug 1, 2024 • 10
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation Paper • 2408.02629 • Published Aug 5, 2024 • 13
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer Paper • 2408.03284 • Published Aug 6, 2024 • 11
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics Paper • 2408.04631 • Published Aug 8, 2024 • 9
Kalman-Inspired Feature Propagation for Video Face Super-Resolution Paper • 2408.05205 • Published Aug 9, 2024 • 8
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12, 2024 • 37
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance Paper • 2408.08189 • Published Aug 15, 2024 • 16
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data Paper • 2408.10119 • Published Aug 19, 2024 • 16
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models Paper • 2408.11318 • Published Aug 21, 2024 • 55
TrackGo: A Flexible and Efficient Method for Controllable Video Generation Paper • 2408.11475 • Published Aug 21, 2024 • 17
Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published Aug 22, 2024 • 16
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities Paper • 2408.13239 • Published Aug 23, 2024 • 11
Training-free Long Video Generation with Chain of Diffusion Model Experts Paper • 2408.13423 • Published Aug 24, 2024 • 22
TVG: A Training-free Transition Video Generation Method with Diffusion Models Paper • 2408.13413 • Published Aug 24, 2024 • 14
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper • 2408.15239 • Published Aug 27, 2024 • 29
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model Paper • 2409.01199 • Published Sep 2, 2024 • 14
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation Paper • 2409.01055 • Published Sep 2, 2024 • 6
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4, 2024 • 91
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published Sep 17, 2024 • 14
Towards Diverse and Efficient Audio Captioning via Diffusion Models Paper • 2409.09401 • Published Sep 14, 2024 • 7
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published Sep 19, 2024 • 24
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation Paper • 2409.12532 • Published Sep 19, 2024 • 5
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published Sep 24, 2024 • 33
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27, 2024 • 26
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide Paper • 2410.04364 • Published Oct 6, 2024 • 28
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4, 2024 • 5
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8, 2024 • 39
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design Paper • 2410.05677 • Published Oct 8, 2024 • 14
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3, 2024 • 36
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14, 2024 • 54
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper • 2410.10774 • Published Oct 14, 2024 • 25
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions Paper • 2410.10816 • Published Oct 14, 2024 • 20
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control Paper • 2410.13830 • Published Oct 17, 2024 • 24
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22, 2024 • 25
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published Oct 25, 2024 • 23
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published Oct 26, 2024 • 23
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation Paper • 2410.23277 • Published Oct 30, 2024 • 9
Fashion-VDM: Video Diffusion Model for Virtual Try-On Paper • 2411.00225 • Published Oct 31, 2024 • 9
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published Nov 4, 2024 • 23
Motion Control for Enhanced Complex Action Video Generation Paper • 2411.08328 • Published Nov 13, 2024 • 5
AnimateAnything: Consistent and Controllable Animation for Video Generation Paper • 2411.10836 • Published Nov 16, 2024 • 23
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing Paper • 2411.11045 • Published Nov 17, 2024 • 11
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations Paper • 2411.10818 • Published Nov 16, 2024 • 24
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper • 2411.13503 • Published Nov 20, 2024 • 30
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control Paper • 2411.13807 • Published Nov 21, 2024 • 11
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction Paper • 2411.14762 • Published Nov 22, 2024 • 11
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Paper • 2411.15115 • Published Nov 22, 2024 • 9
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation Paper • 2411.16657 • Published Nov 25, 2024 • 17
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation Paper • 2411.17383 • Published Nov 26, 2024 • 6
Identity-Preserving Text-to-Video Generation by Frequency Decomposition Paper • 2411.17440 • Published Nov 26, 2024 • 35
Free^2Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models Paper • 2411.17041 • Published Nov 26, 2024 • 12
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model Paper • 2411.19108 • Published Nov 28, 2024 • 17
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling Paper • 2411.18664 • Published Nov 27, 2024 • 23
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers Paper • 2411.18673 • Published Nov 27, 2024 • 8
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation Paper • 2412.00927 • Published Dec 1, 2024 • 26
Open-Sora Plan: Open-Source Large Video Generation Model Paper • 2412.00131 • Published Nov 28, 2024 • 33
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video Paper • 2411.18671 • Published Nov 27, 2024 • 20
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model Paper • 2411.17459 • Published Nov 26, 2024 • 10
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation Paper • 2412.01316 • Published Dec 2, 2024 • 8
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published Dec 3, 2024 • 59
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images Paper • 2412.03517 • Published Dec 4, 2024 • 18
Mimir: Improving Video Diffusion Models for Precise Text Understanding Paper • 2412.03085 • Published Dec 4, 2024 • 12
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment Paper • 2412.04814 • Published Dec 6, 2024 • 45
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration Paper • 2412.04440 • Published Dec 5, 2024 • 19
Mind the Time: Temporally-Controlled Multi-Event Video Generation Paper • 2412.05263 • Published Dec 6, 2024 • 10
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper • 2412.04432 • Published Dec 5, 2024 • 14
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance Paper • 2412.05355 • Published Dec 6, 2024 • 7
STIV: Scalable Text and Image Conditioned Video Generation Paper • 2412.07730 • Published 27 days ago • 70
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper • 2412.07760 • Published 27 days ago • 50
StyleMaster: Stylize Your Video with Artistic Generation and Translation Paper • 2412.07744 • Published 27 days ago • 19
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation Paper • 2412.06016 • Published 29 days ago • 20
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation Paper • 2412.09349 • Published 25 days ago • 8
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption Paper • 2412.09283 • Published 25 days ago • 19
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity Paper • 2412.09856 • Published 25 days ago • 9
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping Paper • 2412.11279 • Published 22 days ago • 12
SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner Paper • 2412.10533 • Published 24 days ago • 5
MIVE: New Design and Benchmark for Multi-Instance Video Editing Paper • 2412.12877 • Published 20 days ago • 4
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published 19 days ago • 14
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published 14 days ago • 24
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Paper • 2412.18597 • Published 13 days ago • 19
MotiF: Making Text Count in Image Animation with Motion Focal Loss Paper • 2412.16153 • Published 17 days ago • 6
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models Paper • 2412.19645 • Published 10 days ago • 13
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 4 days ago • 42
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 6 days ago • 38
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Paper • 2501.01423 • Published 4 days ago • 32
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published 4 days ago • 20