yyqoni commited on
Commit
cac634a
·
verified ·
1 Parent(s): 1a940af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -8
README.md CHANGED
@@ -1,18 +1,20 @@
1
- ---
2
- library_name: transformers
3
- license: mit
4
- datasets:
5
- - hendrydong/preference_700K
6
- base_model:
7
- - microsoft/Phi-3-mini-4k-instruct
8
- ---
9
 
10
 
11
  # phi-instruct-segment Model Card
12
 
13
  ## Method
14
 
 
15
  The segment reward model assigns rewards to semantically meaningful text segments, segmented dynamically with an entropy-based threshold. It is trained on binary preference labels from human feedback, optimizing a Bradley-Terry loss function that aggregates segment rewards using the average function.
 
16
  <div align=center>
17
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/605e8dfd5abeb13e714c4c18/GnDEETLQeFpqx7-enIENw.png)
18
  </div>
 
1
+ ---
2
+ library_name: transformers
3
+ license: mit
4
+ datasets:
5
+ - hendrydong/preference_700K
6
+ base_model:
7
+ - microsoft/Phi-3-mini-4k-instruct
8
+ ---
9
 
10
 
11
  # phi-instruct-segment Model Card
12
 
13
  ## Method
14
 
15
+
16
  The segment reward model assigns rewards to semantically meaningful text segments, segmented dynamically with an entropy-based threshold. It is trained on binary preference labels from human feedback, optimizing a Bradley-Terry loss function that aggregates segment rewards using the average function.
17
+
18
  <div align=center>
19
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/605e8dfd5abeb13e714c4c18/GnDEETLQeFpqx7-enIENw.png)
20
  </div>