DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought Paper • 2412.17498 • Published 16 days ago • 21
view article Article Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging By akjindal53244 • Aug 19, 2024 • 75
Skywork-Reward-Data-Collection Collection Open-source preference datasets used to train the Skywork reward model series • 17 items • Updated Oct 12, 2024 • 12