Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23, 2024 • 23
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23, 2024 • 23 • 4
In-Context Imitation Learning via Next-Token Prediction Paper • 2408.15980 • Published Aug 28, 2024 • 9
Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models Paper • 2408.06663 • Published Aug 13, 2024 • 16
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way Paper • 2312.00407 • Published Dec 1, 2023 • 2
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning Paper • 2402.06332 • Published Feb 9, 2024 • 18
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning Paper • 2402.06332 • Published Feb 9, 2024 • 18
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11, 2024 • 44
Leveraging Large Language Models for Automated Proof Synthesis in Rust Paper • 2311.03739 • Published Nov 7, 2023 • 5