6 33 17

Zesen Cheng

ClownRat

AI & ML interests

multi-modal foundation model; Segmentation, Detection, and Tracking;

Recent Activity

authored a paper 1 day ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

upvoted a paper 1 day ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

updated a model 2 days ago

ClownRat/VideoLLaMA2.1-7B-16F

View all activity

Organizations

ClownRat's activity

authored a paper 1 day ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published 7 days ago • 39

upvoted a paper 1 day ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published 14 days ago • 64

updated a model 2 days ago

ClownRat/VideoLLaMA2.1-7B-16F

Text Generation • Updated 2 days ago • 4

upvoted 2 papers 2 days ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published 7 days ago • 39

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 6 days ago • 87

updated 2 models 13 days ago

ClownRat/resnet-50-torchvision

Updated 13 days ago • 1.57k

ClownRat/mask2former-resnet-50-coco-instance

Updated 13 days ago • 847

updated a model 16 days ago

ClownRat/resnet-101-torchvision

Updated 16 days ago • 7

updated a collection 19 days ago

Mask2Former

Collection

2 items • Updated 19 days ago

liked a dataset 20 days ago

ClownRat/COCO2017-Instance

Viewer • Updated 27 days ago • 123k • 22 • 1

updated a model 22 days ago

ClownRat/mask2former-resnet-101-coco-instance

Updated 22 days ago • 10

updated a dataset 27 days ago

ClownRat/COCO2017-Instance

Viewer • Updated 27 days ago • 123k • 22 • 1

upvoted 3 papers about 1 month ago

Towards Universal Soccer Video Understanding

Paper • 2412.01820 • Published Dec 2, 2024 • 9

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 17

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 105

liked 2 datasets about 1 month ago

hyc2026/MovieStory101

Updated Nov 19, 2024 • 77 • 5

Uni-MoE/VideoVista

Updated Jul 5, 2024 • 90 • 2

upvoted 2 papers about 1 month ago

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning

Paper • 2412.03565 • Published Dec 4, 2024 • 11

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Paper • 2412.03069 • Published Dec 4, 2024 • 30