No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance Paper • 2404.04125 • Published Apr 4, 2024 • 27
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies Paper • 2404.08197 • Published Apr 12, 2024 • 28
Probing the 3D Awareness of Visual Foundation Models Paper • 2404.08636 • Published Apr 12, 2024 • 13
AM-RADIO: Agglomerative Model -- Reduce All Domains Into One Paper • 2312.06709 • Published Dec 10, 2023 • 1
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published May 16, 2024 • 27
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 59