This repository contains the released models for our paper Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model.
-
yyqoni/Phi-3-mini-4k-instruct-segment-rm-700k
Text Classification • Updated • 19 -
yyqoni/Phi-3-mini-4k-instruct-token-rm-700k
Text Classification • Updated • 14 -
yyqoni/Phi-3-mini-4k-instruct-bandit-rm-700k
Text Classification • Updated • 15 -
yyqoni/rlhflow-llama-3-sft-8b-v2-segment-rm-700k
Text Classification • Updated • 14