Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published Dec 2, 2024 • 22
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Paper • 2411.11922 • Published Nov 18, 2024 • 18
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Paper • 2410.14669 • Published Oct 18, 2024 • 36
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper • 2408.00765 • Published Aug 1, 2024 • 13
DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations Paper • 2307.07047 • Published Jul 13, 2023 • 15
In-Context Learning for Few-Shot Dialogue State Tracking Paper • 2203.08568 • Published Mar 16, 2022 • 1
OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking Paper • 2311.09758 • Published Nov 16, 2023 • 1
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper • 2406.09403 • Published Jun 13, 2024 • 19
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12, 2024 • 24
SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning Paper • 2403.09110 • Published Mar 14, 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs Paper • 2404.16375 • Published Apr 25, 2024 • 16
Instruction-tuned Language Models are Better Knowledge Learners Paper • 2402.12847 • Published Feb 20, 2024 • 25
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation Paper • 2311.07562 • Published Nov 13, 2023 • 13
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences Paper • 2311.06025 • Published Nov 10, 2023 • 1
MM-VID: Advancing Video Understanding with GPT-4V(ision) Paper • 2310.19773 • Published Oct 30, 2023 • 19
Detecting Pretraining Data from Large Language Models Paper • 2310.16789 • Published Oct 25, 2023 • 10
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design Paper • 2310.15144 • Published Oct 23, 2023 • 13
OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation Paper • 2310.07749 • Published Oct 11, 2023 • 5