2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 5 days ago • 82
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 4 days ago • 41
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Paper • 2501.01423 • Published 4 days ago • 32
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents Paper • 2411.13552 • Published Nov 20, 2024
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Paper • 2412.16112 • Published 17 days ago • 21
Efficient Scaling of Diffusion Transformers for Text-to-Image Generation Paper • 2412.12391 • Published 21 days ago
ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance Paper • 2412.06163 • Published 29 days ago
On the Surprising Effectiveness of Attention Transfer for Vision Transformers Paper • 2411.09702 • Published Nov 14, 2024 • 1
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers Paper • 2411.06786 • Published Nov 11, 2024
FlexDiT: Dynamic Token Density Control for Diffusion Transformer Paper • 2412.06028 • Published 29 days ago
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 58
Training and Evaluating Language Models with Template-based Data Generation Paper • 2411.18104 • Published Nov 27, 2024 • 3
TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text Paper • 2410.21479 • Published Oct 28, 2024
TinyLLaVA: A Framework of Small-scale Large Multimodal Models Paper • 2402.14289 • Published Feb 22, 2024 • 19
TinyLLM: Learning a Small Student from Multiple Large Language Models Paper • 2402.04616 • Published Feb 7, 2024
TinyEmo: Scaling down Emotional Reasoning via Metric Projection Paper • 2410.07062 • Published Oct 9, 2024 • 3
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper • 2408.15881 • Published Aug 28, 2024 • 21
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model Paper • 2404.04167 • Published Apr 5, 2024 • 12
Rethinking Optimization and Architecture for Tiny Language Models Paper • 2402.02791 • Published Feb 5, 2024 • 12
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Paper • 2312.16862 • Published Dec 28, 2023 • 30
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published 4 days ago • 23
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers Paper • 2412.09722 • Published 25 days ago • 5
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning Paper • 2412.09078 • Published 26 days ago
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement Paper • 2412.06176 • Published 29 days ago
MC-NEST -- Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree Paper • 2411.15645 • Published Nov 23, 2024
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback Paper • 2412.03578 • Published Nov 18, 2024
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision Paper • 2411.16579 • Published Nov 25, 2024 • 2
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 20 days ago • 119
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework Paper • 2308.08155 • Published Aug 16, 2023 • 3
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 3 days ago • 12
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 3 days ago • 20
SDPO: Segment-Level Direct Preference Optimization for Social Agents Paper • 2501.01821 • Published 3 days ago • 10
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation Paper • 2412.21059 • Published 7 days ago • 11
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models Paper • 2501.00874 • Published 5 days ago • 7
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery Paper • 2501.01540 • Published 4 days ago • 4
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations Paper • 2404.09785 • Published Apr 15, 2024
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31, 2024 • 76