Prince Canuma's picture

Prince Canuma

prince-canuma

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

Diving into Self-Evolving Training for Multimodal Reasoning

upvoted a paper 1 day ago

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

upvoted a paper 1 day ago

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

View all activity

Organizations

prince-canuma's activity

upvoted 6 papers 1 day ago

Diving into Self-Evolving Training for Multimodal Reasoning

Paper • 2412.17451 • Published 15 days ago • 41

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published 14 days ago • 34

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Paper • 2412.19326 • Published 11 days ago • 18

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published 10 days ago • 70

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 5 days ago • 82

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published 7 days ago • 22

upvoted a collection 15 days ago

DeepSeek-VL2

14 items • Updated 15 days ago • 2

upvoted 12 papers 17 days ago

Phi-4 Technical Report

Paper • 2412.08905 • Published 26 days ago • 97

Large Action Models: From Inception to Implementation

Paper • 2412.10047 • Published 24 days ago • 31

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 24 days ago • 136

Smaller Language Models Are Better Instruction Evolvers

Paper • 2412.11231 • Published 22 days ago • 27

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 25 days ago • 83

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published 20 days ago • 13

GUI Agents: A Survey

Paper • 2412.13501 • Published 20 days ago • 23

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published 19 days ago • 23

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

Paper • 2412.14233 • Published 19 days ago • 6

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Paper • 2412.15084 • Published 18 days ago • 12

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published 19 days ago • 48

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 18 days ago • 337

upvoted a collection 20 days ago

Falcon3

Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. • 40 items • Updated 18 days ago • 75