SamPO
Collection
Resources for EMNLP 2024 Paper: Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
•
4 items
•
Updated
•
2
This repository provides a fine-tuned version of Pythia-2.8B, using our proposed SamPO algorithm: Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence.
vs. SFT | wins | len / token |
---|---|---|
DPO | 60.98 | 53.8 |
Iterative DPO | 73.58 | 66.65 |
Length Normed DPO | 58.13 | 47.34 |
SimPO | 33.33 | 31.9 |
Iterative SamPO | 73.58 | 49.54 |
We test our model with the same GPT-4 Win rate prompt template proposed by the DPO paper. The sampled test set is included in this repo.
The following hyperparameters were used during DPO/SamPO training: