Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
RLHFlow
's Collections
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
SFT Models
updated
Nov 3, 2024
We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose.
Upvote
1
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
Nov 3, 2024
•
4.77k
•
8
RLHFlow/RLHFlow-SFT-Dataset-ver2
Viewer
•
Updated
Nov 2, 2024
•
2.32M
•
75
•
4
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
Nov 3, 2024
•
604
RLHFlow/Llama3-SFT-v2.0-epoch1
Text Generation
•
Updated
Nov 3, 2024
•
20
RLHFlow/Llama3-SFT-v2.0-epoch2
Text Generation
•
Updated
Nov 3, 2024
•
12
RLHFlow/Llama3-SFT-v2.0-epoch3
Text Generation
•
Updated
Nov 3, 2024
•
1.06k
Upvote
1
Share collection
View history
Collection guide
Browse collections