kaizuberbuehler
's Collections
Synthetic Data and Self-Improvement
updated
Self-Rewarding Language Models
Paper
•
2401.10020
•
Published
•
145
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper
•
2402.03620
•
Published
•
114
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper
•
2402.07456
•
Published
•
42
Learning From Mistakes Makes LLM Better Reasoner
Paper
•
2310.20689
•
Published
•
28
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper
•
2404.07503
•
Published
•
29
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
60
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models
with a Self-Critique Pipeline
Paper
•
2404.02893
•
Published
•
20
Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper
•
2305.16291
•
Published
•
9
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper
•
2303.11366
•
Published
•
4
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of
Diverse Models
Paper
•
2404.18796
•
Published
•
68
Extending Llama-3's Context Ten-Fold Overnight
Paper
•
2404.19553
•
Published
•
33
Diffusion for World Modeling: Visual Details Matter in Atari
Paper
•
2405.12399
•
Published
•
28
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
Reference Models
Paper
•
2405.20541
•
Published
•
22
ShareGPT4Video: Improving Video Understanding and Generation with Better
Captions
Paper
•
2406.04325
•
Published
•
73
Paper
•
2406.09414
•
Published
•
95
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
•
2406.08464
•
Published
•
66
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper
•
2406.20094
•
Published
•
97
Diffusion Augmented Agents: A Framework for Efficient Exploration and
Transfer Learning
Paper
•
2407.20798
•
Published
•
24
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
•
2408.06195
•
Published
•
64
Data curation via joint example selection further accelerates multimodal
learning
Paper
•
2406.17711
•
Published
•
3
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
136
Thinking LLMs: General Instruction Following with Thought Generation
Paper
•
2410.10630
•
Published
•
18
How to Synthesize Text Data without Model Collapse?
Paper
•
2412.14689
•
Published
•
48
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity
Visual Descriptions
Paper
•
2412.08737
•
Published
•
52
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse
Task Synthesis
Paper
•
2412.19723
•
Published
•
70
ProgCo: Program Helps Self-Correction of Large Language Models
Paper
•
2501.01264
•
Published
•
23
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper
•
2412.21139
•
Published
•
16
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
•
2412.14922
•
Published
•
84
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
•
2412.17256
•
Published
•
44
Diving into Self-Evolving Training for Multimodal Reasoning
Paper
•
2412.17451
•
Published
•
41
Paper
•
2412.16720
•
Published
•
29
ResearchTown: Simulator of Human Research Community
Paper
•
2412.17767
•
Published
•
12
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
•
2412.11605
•
Published
•
16
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web
Tutorials
Paper
•
2412.09605
•
Published
•
26
Evaluating Language Models as Synthetic Data Generators
Paper
•
2412.03679
•
Published
•
46
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree
Search for Code Generation
Paper
•
2411.11053
•
Published
•
3
CodeDPO: Aligning Code Models with Self Generated and Verified Source
Code
Paper
•
2410.05605
•
Published
•
1
Enhancing LLM Reasoning via Critique Models with Test-Time and
Training-Time Supervision
Paper
•
2411.16579
•
Published
•
2
Vision-Language Models Can Self-Improve Reasoning via Reflection
Paper
•
2411.00855
•
Published
•
5
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
•
2411.04282
•
Published
•
32
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large
Language Models
Paper
•
2411.14432
•
Published
•
22
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper
•
2411.18203
•
Published
•
33
From Generation to Judgment: Opportunities and Challenges of
LLM-as-a-judge
Paper
•
2411.16594
•
Published
•
37
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
Mathematical Reasoning
Paper
•
2410.02884
•
Published
•
53
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning
for Web Agents
Paper
•
2411.06559
•
Published
•
12
Generative World Explorer
Paper
•
2411.11844
•
Published
•
75
Large Language Models Can Self-Improve in Long-context Reasoning
Paper
•
2411.08147
•
Published
•
63
Stronger Models are NOT Stronger Teachers for Instruction Tuning
Paper
•
2411.07133
•
Published
•
35
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions
Paper
•
2411.07461
•
Published
•
22
Self-Consistency Preference Optimization
Paper
•
2411.04109
•
Published
•
17