VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Abstract
This paper presents a novel paradigm for building scalable 3D generative models utilizing pre-trained video diffusion models. The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike images, texts, or videos, 3D data are not readily accessible and are difficult to acquire. This results in a significant disparity in scale compared to the vast quantities of other types of data. To address this issue, we propose using a video diffusion model, trained with extensive volumes of text, images, and videos, as a knowledge source for 3D data. By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model. The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds and achieves superior performance when compared to current SOTA feed-forward 3D generative models, with users preferring our results over 70% of the time.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- V3D: Video Diffusion Models are Effective 3D Generators (2024)
- LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation (2024)
- IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation (2024)
- Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting (2024)
- Envision3D: One Image to 3D with Anchor Views Interpolation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
VFusion3D: Revolutionizing 3D Model Generation with Video Diffusion
Links ๐:
๐ Subscribe: https://www.youtube.com/@Arxflix
๐ Twitter: https://x.com/arxflix
๐ LMNT (Partner): https://lmnt.com/
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 13
Collections including this paper 0
No Collection including this paper