This is a collection of datasets and models of process reward modeling.
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
View all activity
models
19
RLHFlow/Llama3.1-8B-PRM-Mistral-Data
Text Generation
•
Updated
•
2.27k
•
7
RLHFlow/Llama3.1-8B-PRM-Deepseek-Data
Text Generation
•
Updated
•
12.3k
•
30
RLHFlow/Llama3.1-8B-ORM-Deepseek-Data
Text Generation
•
Updated
•
146
RLHFlow/Llama3.1-8B-ORM-Mistral-Data
Text Generation
•
Updated
•
416
RLHFlow/Llama3-v2-iterative-DPO-iter3
Text Generation
•
Updated
•
22
RLHFlow/Llama3-v2-iterative-DPO-iter2
Text Generation
•
Updated
•
10
RLHFlow/Llama3-v2-iterative-DPO-iter1
Text Generation
•
Updated
•
13
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
•
604
RLHFlow/Llama3-SFT-v2.0-epoch1
Text Generation
•
Updated
•
20
RLHFlow/Llama3-SFT-v2.0-epoch2
Text Generation
•
Updated
•
12
datasets
64
RLHFlow/DS-and-Mistral-PRM-Data
Viewer
•
Updated
•
526k
•
31
RLHFlow/Deepseek-MATH500-Test
Viewer
•
Updated
•
500
•
87
RLHFlow/Mistral-MATH500-Test
Viewer
•
Updated
•
500
•
120
RLHFlow/Deepseek-ORM-Data
Viewer
•
Updated
•
253k
•
53
•
2
RLHFlow/Deepseek-PRM-Data
Viewer
•
Updated
•
253k
•
70
•
8
RLHFlow/Mistral-ORM-Data
Viewer
•
Updated
•
273k
•
132
•
2
RLHFlow/Mistral-PRM-Data
Viewer
•
Updated
•
273k
•
93
•
9
RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-PRM
Viewer
•
Updated
•
500
•
34
RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-ORM
Viewer
•
Updated
•
500
•
38
RLHFlow/Mistral-GSM8K-Test-Result-of-Mistral-ORM
Viewer
•
Updated
•
1.32k
•
30