Quantizations of https://huggingface.co/wzhouad/gemma-2-9b-it-WPO-HB
Inference Clients/UIs
From original readme
gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data:
- On-policy sampled gemma outputs based on Ultrafeedback prompts.
- GPT-4-turbo outputs based on Ultrafeedback prompts.
In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
We provide our training data at wzhouad/gemma-2-ultrafeedback-hybrid.
- Downloads last month
- 637