Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Paper • 2501.03225 • Published 1 day ago • 4
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 25 days ago • 136
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published Jul 8, 2024 • 26
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15, 2024 • 32
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models Paper • 2305.17311 • Published May 27, 2023 • 1
Describing Differences in Image Sets with Natural Language Paper • 2312.02974 • Published Dec 5, 2023 • 13