Update README.md
Browse files
README.md
CHANGED
@@ -1,18 +1,20 @@
|
|
1 |
-
---
|
2 |
-
library_name: transformers
|
3 |
-
license: mit
|
4 |
-
datasets:
|
5 |
-
- hendrydong/preference_700K
|
6 |
-
base_model:
|
7 |
-
- microsoft/Phi-3-mini-4k-instruct
|
8 |
-
---
|
9 |
|
10 |
|
11 |
# phi-instruct-segment Model Card
|
12 |
|
13 |
## Method
|
14 |
|
|
|
15 |
The segment reward model assigns rewards to semantically meaningful text segments, segmented dynamically with an entropy-based threshold. It is trained on binary preference labels from human feedback, optimizing a Bradley-Terry loss function that aggregates segment rewards using the average function.
|
|
|
16 |
<div align=center>
|
17 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/605e8dfd5abeb13e714c4c18/GnDEETLQeFpqx7-enIENw.png)
|
18 |
</div>
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
license: mit
|
4 |
+
datasets:
|
5 |
+
- hendrydong/preference_700K
|
6 |
+
base_model:
|
7 |
+
- microsoft/Phi-3-mini-4k-instruct
|
8 |
+
---
|
9 |
|
10 |
|
11 |
# phi-instruct-segment Model Card
|
12 |
|
13 |
## Method
|
14 |
|
15 |
+
|
16 |
The segment reward model assigns rewards to semantically meaningful text segments, segmented dynamically with an entropy-based threshold. It is trained on binary preference labels from human feedback, optimizing a Bradley-Terry loss function that aggregates segment rewards using the average function.
|
17 |
+
|
18 |
<div align=center>
|
19 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/605e8dfd5abeb13e714c4c18/GnDEETLQeFpqx7-enIENw.png)
|
20 |
</div>
|