yyqoni commited on
Commit
b7229ec
·
verified ·
1 Parent(s): a358680

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -7,4 +7,4 @@ base_model:
7
  - microsoft/Phi-3-mini-4k-instruct
8
  ---
9
 
10
- This is the token-wise reward model introduced in the preprint Segmenting Text and Learning Their Rewards for Improved RLHF in Language Models (https://arxiv.org/abs/2501.02790). For more details, please visit our repository at https://github.com/yinyueqin/DenseRewardRLHF-PPO.
 
7
  - microsoft/Phi-3-mini-4k-instruct
8
  ---
9
 
10
+ This is the token-wise reward model introduced in the preprint **Segmenting Text and Learning Their Rewards for Improved RLHF in Language Models** (https://arxiv.org/abs/2501.02790). For more details, please visit our repository at https://github.com/yinyueqin/DenseRewardRLHF-PPO.