yyqoni
/

Phi-3-mini-4k-instruct-token-rm-700k

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

yyqoni commited on 11 days ago

Commit

b7229ec

·

verified ·

1 Parent(s): a358680

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,4 +7,4 @@ base_model:
 - microsoft/Phi-3-mini-4k-instruct
 ---
-This is the token-wise reward model introduced in the preprint Segmenting Text and Learning Their Rewards for Improved RLHF in Language Models (https://arxiv.org/abs/2501.02790). For more details, please visit our repository at https://github.com/yinyueqin/DenseRewardRLHF-PPO.

 - microsoft/Phi-3-mini-4k-instruct
 ---
+This is the token-wise reward model introduced in the preprint **Segmenting Text and Learning Their Rewards for Improved RLHF in Language Models** (https://arxiv.org/abs/2501.02790). For more details, please visit our repository at https://github.com/yinyueqin/DenseRewardRLHF-PPO.