Collections
Discover the best community collections!
Collections including paper arxiv:2501.08313
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 63 -
Phi-4 Technical Report
Paper • 2412.08905 • Published • 103 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 258 -
MangaNinja: Line Art Colorization with Precise Reference Following
Paper • 2501.08332 • Published • 48
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 24 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 95 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 25 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 94
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 230 -
Learning an evolved mixture model for task-free continual learning
Paper • 2207.05080 • Published • 1 -
EVOLvE: Evaluating and Optimizing LLMs For Exploration
Paper • 2410.06238 • Published • 1 -
Smaller Language Models Are Better Instruction Evolvers
Paper • 2412.11231 • Published • 27
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 95 -
ProgCo: Program Helps Self-Correction of Large Language Models
Paper • 2501.01264 • Published • 25 -
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Paper • 2501.01957 • Published • 40 -
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
Paper • 2501.03226 • Published • 35
-
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 48 -
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Paper • 2412.12094 • Published • 10 -
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Paper • 2306.07691 • Published • 5 -
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Paper • 2203.02395 • Published
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 20 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 20 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 15
-
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 28 -
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 -
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 16 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 17