Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23, 2024 • 23
In-Context Imitation Learning via Next-Token Prediction Paper • 2408.15980 • Published Aug 28, 2024 • 9
Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models Paper • 2408.06663 • Published Aug 13, 2024 • 16
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning Paper • 2402.06332 • Published Feb 9, 2024 • 18
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11, 2024 • 44
Leveraging Large Language Models for Automated Proof Synthesis in Rust Paper • 2311.03739 • Published Nov 7, 2023 • 5