Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
sfulay
/
zephyr-7b-dpo-full-gpt_consistent-reward-scale-1-rpo-gamma-2
like
0
Safetensors
mistral
trl
dpo
Generated from Trainer
License:
apache-2.0
Model card
Files
Files and versions
Community
Train
aa34c83
zephyr-7b-dpo-full-gpt_consistent-reward-scale-1-rpo-gamma-2
Commit History
Training in progress, step 436
aa34c83
verified
sfulay
commited on
Sep 3, 2024
Training in progress, step 400
73d237e
verified
sfulay
commited on
Sep 3, 2024
Training in progress, step 300
97605a7
verified
sfulay
commited on
Sep 2, 2024
Training in progress, step 200
6926fc7
verified
sfulay
commited on
Sep 2, 2024
Training in progress, step 100
fc367dc
verified
sfulay
commited on
Sep 2, 2024
initial commit
7451ad4
verified
sfulay
commited on
Sep 2, 2024