Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,4 @@ base_model:
|
|
7 |
- microsoft/Phi-3-mini-4k-instruct
|
8 |
---
|
9 |
|
10 |
-
This is the token-wise reward model introduced in the preprint Segmenting Text and Learning Their Rewards for Improved RLHF in Language Models (https://arxiv.org/abs/2501.02790). For more details, please visit our repository at https://github.com/yinyueqin/DenseRewardRLHF-PPO.
|
|
|
7 |
- microsoft/Phi-3-mini-4k-instruct
|
8 |
---
|
9 |
|
10 |
+
This is the token-wise reward model introduced in the preprint **Segmenting Text and Learning Their Rewards for Improved RLHF in Language Models** (https://arxiv.org/abs/2501.02790). For more details, please visit our repository at https://github.com/yinyueqin/DenseRewardRLHF-PPO.
|