Art Atk

ArtAtk

AI & ML interests

Multimodal Models

Recent Activity

upvoted a paper 4 days ago

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

upvoted a paper 5 days ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

upvoted a paper 5 days ago

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

View all activity

Organizations

None yet

ArtAtk's activity

upvoted a paper 4 days ago

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper • 2501.01423 • Published 5 days ago • 33

upvoted 2 papers 5 days ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published 7 days ago • 39

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published 11 days ago • 73

upvoted a paper 11 days ago

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published 15 days ago • 34

upvoted a paper 13 days ago

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published 15 days ago • 37

upvoted 3 papers 15 days ago

Outcome-Refining Process Supervision for Code Generation

Paper • 2412.15118 • Published 19 days ago • 19

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published 16 days ago • 44

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Paper • 2412.16112 • Published 18 days ago • 21

upvoted a paper 18 days ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 19 days ago • 338

upvoted a paper 19 days ago

No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 23 days ago • 41

upvoted a paper 22 days ago

ColorFlow: Retrieval-Augmented Image Sequence Colorization

Paper • 2412.11815 • Published 23 days ago • 26

upvoted a paper 23 days ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 25 days ago • 136

upvoted a paper 24 days ago

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Paper • 2412.09619 • Published 26 days ago • 20

upvoted a paper 26 days ago

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Paper • 2412.07760 • Published 28 days ago • 50

upvoted a paper 30 days ago

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Paper • 2412.04445 • Published Dec 5, 2024 • 21

upvoted 5 papers about 1 month ago