VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 7 days ago • 39
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 14 days ago • 64
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 7 days ago • 39
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 6 days ago • 87
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Paper • 2412.03304 • Published Dec 4, 2024 • 17
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 105
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 11
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Paper • 2412.03069 • Published Dec 4, 2024 • 30