Papers
arxiv:2102.12092

Zero-Shot Text-to-Image Generation

Published on Feb 24, 2021
Authors:
,
,
,
,
,
,
,

Abstract

Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.

Community

Achieving Zero-Shot Text-to-Image Generation with Autoregressive Transformers

Links ๐Ÿ”—:

๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix
๐Ÿ‘‰ Twitter: https://x.com/arxflix
๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 7

Browse 7 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2102.12092 in a dataset README.md to link it from this page.

Spaces citing this paper 120

Collections including this paper 2