Set tokenizer "model_max_length" property to 8192
Browse filesSomehow composer exported `model_max_length` tokenizer property to a very huge value instead of 8192.
This breaks the `tokenizer.model_max_length` call that some pipelines rely on.
As we corrected max_pos_embeddings, I suggest we also fix this for consistency, although this is not an hard limit.
See [this issue](https://github.com/AnswerDotAI/ModernBERT/issues/166) for more information.
- tokenizer_config.json +1 -1
tokenizer_config.json
CHANGED
@@ -932,7 +932,7 @@
|
|
932 |
"clean_up_tokenization_spaces": true,
|
933 |
"cls_token": "[CLS]",
|
934 |
"mask_token": "[MASK]",
|
935 |
-
"model_max_length":
|
936 |
"pad_token": "[PAD]",
|
937 |
"sep_token": "[SEP]",
|
938 |
"tokenizer_class": "PreTrainedTokenizerFast",
|
|
|
932 |
"clean_up_tokenization_spaces": true,
|
933 |
"cls_token": "[CLS]",
|
934 |
"mask_token": "[MASK]",
|
935 |
+
"model_max_length": 8192,
|
936 |
"pad_token": "[PAD]",
|
937 |
"sep_token": "[SEP]",
|
938 |
"tokenizer_class": "PreTrainedTokenizerFast",
|