Hidden in the Noise: Two-Stage Robust Watermarking for Images Paper • 2412.04653 • Published Dec 5, 2024 • 28
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 121
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28, 2024 • 23
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published Jul 8, 2024 • 26
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published Jul 8, 2024 • 26
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17, 2024 • 50
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published Jun 14, 2024 • 77
DOCCI: Descriptions of Connected and Contrasting Images Paper • 2404.19753 • Published Apr 30, 2024 • 11