sijan1 commited on
Commit
f94f2de
·
verified ·
1 Parent(s): 453a0cb

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,256 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: setfit
3
+ tags:
4
+ - setfit
5
+ - sentence-transformers
6
+ - text-classification
7
+ - generated_from_setfit_trainer
8
+ metrics:
9
+ - accuracy
10
+ widget:
11
+ - text: Hi Jonathan, I just happened to know that you are gathering information for
12
+ our Beta project. While your work is really nice insight and improvement ideas,
13
+ I feel the need to talk to you about what more can be done with your reports I
14
+ have received comments from our team that more time is needed to spent on extracting
15
+ information from your reports. Our team member are looking for technical information
16
+ and redundant comments takes them time to extract the fact and sometime confusing.
17
+ Another thing is that can help us is to organize the report in a more clear, concise
18
+ way. We are showing the reports to our prospect and even the CEO, so we need it
19
+ to be well structured, concise and to the point. I am sure if youspend more time
20
+ to organize your report, you will be able to address this problem. I know you
21
+ are an enthusiastic contributor and you have done a good work until now, but we
22
+ need your reports to be improved for our project team to success. I am afraid
23
+ if the situationis notgetting better we will have to look for someone else towork
24
+ on this project.Please spend more effort to organize your next report and I really
25
+ look forward to your good news
26
+ - text: Hi Jonathan, I hope you are doing well. Unfortunately I won't be able to talk
27
+ to you personally but as soon as I am back I would like to spend some time with
28
+ you. I know you are working on Beta project and your involvement is highly appreciated ,
29
+ you even identified improvements the team didn't identify, that's great! This
30
+ Beta project is key for the company, we need to success all together. In that
31
+ respect, key priorities are to build concise reports and with strong business
32
+ writing. Terry has been within the company for 5 years and is the best one to
33
+ be consulted to upskill in these areas. Could you please liaise with him and get
34
+ more quick wins from him. It will be very impactful in your career. We will discuss
35
+ once I'm back about this sharing experience. I'm sure you will find a lot of benefits.
36
+ Regards William
37
+ - text: 'Hi Jonathan, I am glad to hear that you are enjoying your job, traveling
38
+ and learning more about the Beta ray technology. I wanted to share some feedback
39
+ with you that I received. I want to help you be able to advance in your career
40
+ and I feel that this feedback will be helpful. I am excited that you are will
41
+ to share your perspectives on the findings, however if you could focus on the
42
+ data portion first, and highlight the main points, that would be really beneficial
43
+ to your audience. By being more concise it will allow the potential customers
44
+ and then CEO to focus on the facts of the report, which will allow them to make
45
+ a decision for themselves. I understand that this is probably a newer to writing
46
+ the reports, and I don''t think that anyone has shown you an example of how the
47
+ reports are usually written, so I have sent you some examples for you to review.
48
+ I think that you are doing a good job learning and with this little tweak in the
49
+ report writing you will be able to advance in your career. In order to help you,
50
+ if you don''t mind, I would like to review the report before you submit it and
51
+ then we can work together to ensure it will be a great report. I understand that
52
+ you really enjoy providing your perspectives on the technology and recommendations
53
+ on how it can be used, so we will find a spot for that in the report as well,
54
+ but perhaps in a different section. Thank you so much for your time today and
55
+ I look forward to working with you. '
56
+ - text: Hi Jonathan. I have been away a long time and unable to have regular discussions
57
+ with you. As your manager, I feel responsible for your performance and would love
58
+ to you you grow and perform better. I understand that you are travelling and gaining
59
+ so much information that it can be overwhelming. But our role is to present only
60
+ the most relevant and useful information in our report to the Senior management
61
+ and clients. I have received feedback that they are facing some trouble with the
62
+ reports and would like some changes. Let us focus on our project specifications
63
+ and only present the required details. Your detailed insights may be presented
64
+ at a later stage or as a separate report for evaluation. You may take up a course
65
+ or training on the subject and I am also there if you need any help. If you are
66
+ looking forward to a career growth next year, we need this to be a successful
67
+ assignment.
68
+ - text: Hi Jonathan, and I hope your travels are going well. As soon as you get a
69
+ chance, I would like to catch up on the reports you are creating for the Beta
70
+ projects. Your contributions have been fantastic, but we need to limit the commentary
71
+ and make them more concise. I would love to get your perspective and show you
72
+ an example as well. Our goal is to continue to make you better at what you do
73
+ and to deliver an excellent customer experience. Looking forward to tackling
74
+ this together and to your dedication to being great at what you do. Safe travels
75
+ and I look forward to your call.
76
+ pipeline_tag: text-classification
77
+ inference: true
78
+ base_model: sentence-transformers/all-MiniLM-L6-v2
79
+ model-index:
80
+ - name: SetFit with sentence-transformers/all-MiniLM-L6-v2
81
+ results:
82
+ - task:
83
+ type: text-classification
84
+ name: Text Classification
85
+ dataset:
86
+ name: Unknown
87
+ type: unknown
88
+ split: test
89
+ metrics:
90
+ - type: accuracy
91
+ value: 0.5909090909090909
92
+ name: Accuracy
93
+ ---
94
+
95
+ # SetFit with sentence-transformers/all-MiniLM-L6-v2
96
+
97
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
98
+
99
+ The model has been trained using an efficient few-shot learning technique that involves:
100
+
101
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
102
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
103
+
104
+ ## Model Details
105
+
106
+ ### Model Description
107
+ - **Model Type:** SetFit
108
+ - **Sentence Transformer body:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
109
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
110
+ - **Maximum Sequence Length:** 256 tokens
111
+ - **Number of Classes:** 2 classes
112
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
113
+ <!-- - **Language:** Unknown -->
114
+ <!-- - **License:** Unknown -->
115
+
116
+ ### Model Sources
117
+
118
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
119
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
120
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
121
+
122
+ ### Model Labels
123
+ | Label | Examples |
124
+ |:------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
125
+ | 1 | <ul><li>"Jonathan, I hope you are well - I am very excited that you are part of this development team and really appreciate all the support you give to us; while doing this some comments have arise that can be opportunity areas to improve your work and get this program ahead.1. The communication between team members is not clear and improvements can be done to this: by this I mean to connect more with other team members before submitting your reports.2. One of the reasons you were chosen is because of your enthusiastic attitude and knowledge, but too much information sometimes can harm the delivery reports that needs to be concise and business oriented. 3.Please forward me your latest report so we can discuss it furthermore when I come back and see what can be improve and we can work from there.4. Please don't be discourage, these are opportunity areas that we can engage and as always keep up the good work. Have a great week. Thanks"</li><li>"Hi Jonathan, I hope this message finds you well. I hear things are going well with the Beta project. That said, Terry mentioned that there were some issues with the reports. From what I understand, they would like them to be more concise and straight to the point, as well as more business focused. I recommend you reach out to Terry so you both could review in detail one of the reports he submits. This should help you help you align to their expectations. Additionally, i'd be happy to review the reports before you send them off to Terry and provide my feedback. I know this project is important to you, so please let me know how this meeting goes and how else I can help. Regards, William"</li><li>'Hi Jonathan, Good to hear you are enjoying the work. I would like to discuss with you feedback on your assignment and the reports you are producing. It is very important to understand the stakeholders who will be reading your report. You may have gathered a lot of good information BUT do not put them all on your reports. The report should state facts and not your opinions. Create reports for the purpose and for the audience. I would also suggest that you reach out to Terry to understand what information is needed on the reports you produce.Having said that, the additional insights you gathered are very important too. Please add them to our knowledge repository and share with the team. It will be a great sharing and learning experience. You are very valuable in your knowledge and I think that it would benefit you and the organization tremendously when you are to channelize your insights and present the facts well. I would encourage you to enroll for the business writing training course. Please choose a date from the learning calendar and let me know. Regards, William'</li></ul> |
126
+ | 0 | <ul><li>'Good Afternoon Jonathan, I hope you are well and the travelling is not too exhausting. I wanted to touch base with you to see how you are enjoying working with the Beta project team? I have been advised that you are a great contributor and are identifying some great improvements, so well done. I understand you are completing a lot of reports and imagine this is quite time consuming which added to your traveling must be quite overwhelming. I have reviewed some of your reports and whilst they provide all the technical information that is required, they are quite lengthy and i think it would be beneficial for you to have some training on report structures. This would mean you could spend less time on the reports by providing only the main facts needed and perhaps take on more responsibility. When the reports are reviewed by higher management they need to be able to clearly and quickly identify any issues. Attending some training would also be great to add to your career profile for the future. In the meantime perhaps you could review your reports before submitting to ensure they are clear and consise with only the technical information needed,Let me know your thoughts. Many thanks again and well done for all your hard work. Kind regards William'</li><li>'Jonathan, First I want to thank you for your help with the Beta project. However, it has been brought to my attention that perhaps ABC-5 didn\'t do enough to prepare you for the extra work and I would like to discuss some issues. The nature of these reports requires them to be technical in nature. Your insights are very valuable and much appreciated but as the old line goes "please give me just the facts". Given the critical nature of the information you are providing I can\'t stress the importance of concise yet detail factual reports. I would like to review your reports as a training exercise to help you better meet the team requirements. Given that there are some major reports coming up in the immediate future, I would like you to review some training options and then present a report for review. Again your insights are appreciated but we need to make sure we are presenting the end-use with only the information they need to make a sound business decision. I also understand you would like to grow into a leadership position so I would like to discuss how successfully implementing these changes would be beneficial in demonstrating an ability to grow and take on new challenges. '</li><li>'Hi Jonathan, I wanted to have a discussion with you but since you are travelling i am sharing in this mailThis is related to Beta project and reports coming from there.While we are all excited by the passion and enthusiasm you are bringing i wanted to share some early feedback with you. 1.Please try to be concise in reports and mention facts that teams can refer . We love opinions but lets save those for our brainstorming discussions. 2.For Business writing as you are getting started to help you set up for success we are nominating you for a training program so that your reports are way more effective. I hope as you set on your growth journey and take larger roles a superb feedback from your peers and stakeholders will help. I truly believe above two points can really help you take you there. Wishing you all the best and do share in case you have feedback or inputs from your side. Regards William'</li></ul> |
127
+
128
+ ## Evaluation
129
+
130
+ ### Metrics
131
+ | Label | Accuracy |
132
+ |:--------|:---------|
133
+ | **all** | 0.5909 |
134
+
135
+ ## Uses
136
+
137
+ ### Direct Use for Inference
138
+
139
+ First install the SetFit library:
140
+
141
+ ```bash
142
+ pip install setfit
143
+ ```
144
+
145
+ Then you can load this model and run inference.
146
+
147
+ ```python
148
+ from setfit import SetFitModel
149
+
150
+ # Download from the 🤗 Hub
151
+ model = SetFitModel.from_pretrained("sijan1/empathy_model2")
152
+ # Run inference
153
+ preds = model("Hi Jonathan, and I hope your travels are going well. As soon as you get a chance, I would like to catch up on the reports you are creating for the Beta projects. Your contributions have been fantastic, but we need to limit the commentary and make them more concise. I would love to get your perspective and show you an example as well. Our goal is to continue to make you better at what you do and to deliver an excellent customer experience. Looking forward to tackling this together and to your dedication to being great at what you do. Safe travels and I look forward to your call.")
154
+ ```
155
+
156
+ <!--
157
+ ### Downstream Use
158
+
159
+ *List how someone could finetune this model on their own dataset.*
160
+ -->
161
+
162
+ <!--
163
+ ### Out-of-Scope Use
164
+
165
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
166
+ -->
167
+
168
+ <!--
169
+ ## Bias, Risks and Limitations
170
+
171
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
172
+ -->
173
+
174
+ <!--
175
+ ### Recommendations
176
+
177
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
178
+ -->
179
+
180
+ ## Training Details
181
+
182
+ ### Training Set Metrics
183
+ | Training set | Min | Median | Max |
184
+ |:-------------|:----|:-------|:----|
185
+ | Word count | 129 | 199.5 | 308 |
186
+
187
+ | Label | Training Sample Count |
188
+ |:------|:----------------------|
189
+ | 0 | 4 |
190
+ | 1 | 4 |
191
+
192
+ ### Training Hyperparameters
193
+ - batch_size: (16, 16)
194
+ - num_epochs: (1, 1)
195
+ - max_steps: -1
196
+ - sampling_strategy: oversampling
197
+ - num_iterations: 20
198
+ - body_learning_rate: (2e-05, 2e-05)
199
+ - head_learning_rate: 2e-05
200
+ - loss: CosineSimilarityLoss
201
+ - distance_metric: cosine_distance
202
+ - margin: 0.25
203
+ - end_to_end: False
204
+ - use_amp: False
205
+ - warmup_proportion: 0.1
206
+ - seed: 42
207
+ - eval_max_steps: -1
208
+ - load_best_model_at_end: False
209
+
210
+ ### Training Results
211
+ | Epoch | Step | Training Loss | Validation Loss |
212
+ |:-----:|:----:|:-------------:|:---------------:|
213
+ | 0.05 | 1 | 0.238 | - |
214
+
215
+ ### Framework Versions
216
+ - Python: 3.10.12
217
+ - SetFit: 1.0.3
218
+ - Sentence Transformers: 2.5.0
219
+ - Transformers: 4.37.2
220
+ - PyTorch: 2.1.0+cu121
221
+ - Datasets: 2.17.1
222
+ - Tokenizers: 0.15.2
223
+
224
+ ## Citation
225
+
226
+ ### BibTeX
227
+ ```bibtex
228
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
229
+ doi = {10.48550/ARXIV.2209.11055},
230
+ url = {https://arxiv.org/abs/2209.11055},
231
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
232
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
233
+ title = {Efficient Few-Shot Learning Without Prompts},
234
+ publisher = {arXiv},
235
+ year = {2022},
236
+ copyright = {Creative Commons Attribution 4.0 International}
237
+ }
238
+ ```
239
+
240
+ <!--
241
+ ## Glossary
242
+
243
+ *Clearly define terms in order to be accessible across audiences.*
244
+ -->
245
+
246
+ <!--
247
+ ## Model Card Authors
248
+
249
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
250
+ -->
251
+
252
+ <!--
253
+ ## Model Card Contact
254
+
255
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
256
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.37.2",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.0.0",
4
+ "transformers": "4.6.1",
5
+ "pytorch": "1.8.1"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null
9
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "labels": null,
3
+ "normalize_embeddings": false
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f3eb868f1f7429feaef5f0f01cd2e037a48b29422c67a6c5af8bd2960b42adc
3
+ size 90864192
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7d06440295de113aa0ed56edcc629605071ad370a8974e32247cb9c82ab2ae1
3
+ size 3935
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 128,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff