PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 121
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published Sep 24, 2024 • 33
Portrait Video Editing Empowered by Multimodal Generative Priors Paper • 2409.13591 • Published Sep 20, 2024 • 16
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published Sep 12, 2024 • 19
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks Paper • 2403.14468 • Published Mar 21, 2024 • 23
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12, 2024 • 131
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Paper • 2405.08748 • Published May 14, 2024 • 19
DreamTuner: Single Image is Enough for Subject-Driven Generation Paper • 2312.13691 • Published Dec 21, 2023 • 26
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on Paper • 2403.01779 • Published Mar 4, 2024 • 28
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks Paper • 2403.00522 • Published Mar 1, 2024 • 44
Retrieval-Augmented Generation for Large Language Models: A Survey Paper • 2312.10997 • Published Dec 18, 2023 • 10
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27, 2024 • 88
ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation Paper • 2312.02201 • Published Dec 2, 2023 • 31