plmsmile
's Collections
video llm
updated
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
Paper
•
2404.16994
•
Published
•
35
VideoMamba: State Space Model for Efficient Video Understanding
Paper
•
2403.06977
•
Published
•
27
VideoAgent: Long-form Video Understanding with Large Language Model as
Agent
Paper
•
2403.10517
•
Published
•
32
Video Mamba Suite: State Space Model as a Versatile Alternative for
Video Understanding
Paper
•
2403.09626
•
Published
•
13
InternVideo2: Scaling Video Foundation Models for Multimodal Video
Understanding
Paper
•
2403.15377
•
Published
•
22
VidLA: Video-Language Alignment at Scale
Paper
•
2403.14870
•
Published
•
12
Direct Preference Optimization of Video Large Multimodal Models from
Language Model Reward
Paper
•
2404.01258
•
Published
•
10
Streaming Dense Video Captioning
Paper
•
2404.01297
•
Published
•
11
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
Interleaved Visual-Textual Tokens
Paper
•
2404.03413
•
Published
•
25
Koala: Key frame-conditioned long video-LLM
Paper
•
2404.04346
•
Published
•
6
No Time to Waste: Squeeze Time into Channel for Mobile Video
Understanding
Paper
•
2405.08344
•
Published
•
12
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper
•
2405.15223
•
Published
•
12
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of
Multi-modal LLMs in Video Analysis
Paper
•
2405.21075
•
Published
•
21
ShareGPT4Video: Improving Video Understanding and Generation with Better
Captions
Paper
•
2406.04325
•
Published
•
73
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in
Large Video-Language Models
Paper
•
2406.16338
•
Published
•
25
Flash-VStream: Memory-Based Real-Time Understanding for Long Video
Streams
Paper
•
2406.08085
•
Published
•
13