Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 29 days ago • 69
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21, 2024 • 58
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training Paper • 2309.17179 • Published Sep 29, 2023 • 2
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model Paper • 2410.13639 • Published Oct 17, 2024 • 17
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published Nov 25, 2024 • 41
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3, 2024 • 53
Tree of Problems: Improving structured problem solving with compositionality Paper • 2410.06634 • Published Oct 9, 2024 • 8
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31, 2024 • 12
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6, 2024 • 54
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 18 days ago • 38
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning Paper • 2411.07279 • Published Nov 11, 2024 • 3
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs Paper • 2410.18451 • Published Oct 24, 2024 • 16
Generative Verifiers: Reward Modeling as Next-Token Prediction Paper • 2408.15240 • Published Aug 27, 2024 • 13
Understanding Hidden Computations in Chain-of-Thought Reasoning Paper • 2412.04537 • Published Dec 5, 2024
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 16 days ago • 44
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning Paper • 2410.02089 • Published Oct 2, 2024 • 12
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement Paper • 2412.12881 • Published 22 days ago • 1
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective Paper • 2412.14135 • Published 20 days ago
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published 23 days ago • 16
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 4 days ago • 22
Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search Paper • 2411.11694 • Published Nov 18, 2024
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling Paper • 2408.16737 • Published Aug 29, 2024 • 1