2 26 20

Wujian Peng

wjpoom

https://scholar.google.com/citations?user=GTuWk9YAAAAJ&hl=zh-CN

wjpoom

AI & ML interests

None yet

Recent Activity

updated a dataset 8 days ago

Inst-IT/Inst-It-Bench

updated a dataset 8 days ago

wjpoom/Inst-It-Bench

updated a dataset 20 days ago

Inst-IT/Inst-IT-Dataset

View all activity

Organizations

wjpoom's activity

upvoted a paper 26 days ago

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

Paper • 2312.00081 • Published Nov 30, 2023 • 2

upvoted a paper 30 days ago

Cross-Modality Safety Alignment

Paper • 2406.15279 • Published Jun 21, 2024 • 4

upvoted a paper about 1 month ago

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning

Paper • 2412.03565 • Published Dec 4, 2024 • 11

upvoted a paper 5 months ago

RelBench: A Benchmark for Deep Learning on Relational Databases

Paper • 2407.20060 • Published Jul 29, 2024 • 7

upvoted 4 papers 6 months ago

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Paper • 2407.16982 • Published Jul 24, 2024 • 41

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published Jul 18, 2024 • 16

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 43

Video Diffusion Alignment via Reward Gradients

Paper • 2407.08737 • Published Jul 11, 2024 • 48

upvoted an article 6 months ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15, 2024

• 171

upvoted 4 papers 6 months ago

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1, 2024 • 75

upvoted 7 papers 7 months ago

Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Paper • 2406.12742 • Published Jun 18, 2024 • 14

Improving Visual Commonsense in Language Models via Multiple Image Generation

Paper • 2406.13621 • Published Jun 19, 2024 • 13

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Paper • 2406.13542 • Published Jun 19, 2024 • 16

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Paper • 2406.14544 • Published Jun 20, 2024 • 35

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21, 2024 • 14

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Paper • 2406.16377 • Published Jun 24, 2024 • 12

Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 24