LLM4SR: A Survey on Large Language Models for Scientific Research Paper • 2501.04306 • Published 10 days ago • 33
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published Jun 26, 2024 • 42