Zesen Cheng
ClownRat
AI & ML interests
multi-modal foundation model; Segmentation, Detection, and Tracking;
Recent Activity
authored
a paper
about 7 hours ago
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with
Video LLM
upvoted
a
paper
about 10 hours ago
Explanatory Instructions: Towards Unified Vision Tasks Understanding and
Zero-shot Generalization
updated
a model
about 16 hours ago
ClownRat/VideoLLaMA2.1-7B-16F