RLHFlow

university

RLHFlow

Activity Feed

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

hendrydong authored a paper 14 days ago

Offline Reinforcement Learning for LLM Multi-Step Reasoning

hendrydong new activity about 2 months ago

RLHFlow/LLaMA3.2-1B-SFT:the training data for this model?

weqweasdas updated a dataset about 2 months ago

RLHFlow/DS-and-Mistral-PRM-Data

View all activity

Collections 8

models 19

datasets 64

RLHFlow/DS-and-Mistral-PRM-Data

Viewer • Updated Nov 10, 2024 • 526k • 31

RLHFlow/Deepseek-MATH500-Test

Viewer • Updated Nov 9, 2024 • 500 • 87

RLHFlow/Mistral-MATH500-Test

Viewer • Updated Nov 9, 2024 • 500 • 120

RLHFlow/Deepseek-ORM-Data

Viewer • Updated Nov 9, 2024 • 253k • 53 • 2

RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9, 2024 • 253k • 70 • 8

RLHFlow/Mistral-ORM-Data

Viewer • Updated Nov 9, 2024 • 273k • 132 • 2

RLHFlow/Mistral-PRM-Data

Viewer • Updated Nov 9, 2024 • 273k • 93 • 9

RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-PRM

Viewer • Updated Nov 8, 2024 • 500 • 34

RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-ORM

Viewer • Updated Nov 8, 2024 • 500 • 38

RLHFlow/Mistral-GSM8K-Test-Result-of-Mistral-ORM

Viewer • Updated Nov 8, 2024 • 1.32k • 30

RLHFlow

AI & ML interests

Recent Activity

Collections 8

RLHFlow/Mistral-PRM-Data

RLHFlow/Mistral-GSM8K-Test

RLHFlow/Mistral-MATH500-Test

RLHFlow/Llama3.1-8B-PRM-Mistral-Data

RLHFlow/UltraFeedback-preference-standard

RLHFlow/Helpsteer-preference-standard

RLHFlow/HH-RLHF-Helpful-standard

RLHFlow/Orca-distibalel-standard

models 19

RLHFlow/Llama3.1-8B-PRM-Mistral-Data

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data

RLHFlow/Llama3.1-8B-ORM-Deepseek-Data

RLHFlow/Llama3.1-8B-ORM-Mistral-Data

RLHFlow/Llama3-v2-iterative-DPO-iter3

RLHFlow/Llama3-v2-iterative-DPO-iter2

RLHFlow/Llama3-v2-iterative-DPO-iter1

RLHFlow/LLaMA3-SFT-v2

RLHFlow/Llama3-SFT-v2.0-epoch1

RLHFlow/Llama3-SFT-v2.0-epoch2

datasets 64

RLHFlow/DS-and-Mistral-PRM-Data

RLHFlow/Deepseek-MATH500-Test

RLHFlow/Mistral-MATH500-Test

RLHFlow/Deepseek-ORM-Data

RLHFlow/Deepseek-PRM-Data

RLHFlow/Mistral-ORM-Data

RLHFlow/Mistral-PRM-Data

RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-PRM

RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-ORM

RLHFlow/Mistral-GSM8K-Test-Result-of-Mistral-ORM

AI & ML interests

Recent Activity

Team members 6

Collections 8

models 19 Sort: Recently updated

datasets 64 Sort: Recently updated

models 19

datasets 64