Set tokenizer "model_max_length" property to 8192

Somehow composer exported `model_max_length` tokenizer property to a very huge value instead of 8192.
This breaks the `tokenizer.model_max_length` call that some pipelines rely on.

As we corrected max_pos_embeddings, I suggest we also fix this for consistency, although this is not an hard limit.
See [this issue](https://github.com/AnswerDotAI/ModernBERT/issues/166) for more information.

Files changed (1) hide show

tokenizer_config.json +1 -1

tokenizer_config.json CHANGED Viewed

@@ -932,7 +932,7 @@
   "clean_up_tokenization_spaces": true,
   "cls_token": "[CLS]",
   "mask_token": "[MASK]",
-  "model_max_length": 1000000000000000019884624838656,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",
   "tokenizer_class": "PreTrainedTokenizerFast",

   "clean_up_tokenization_spaces": true,
   "cls_token": "[CLS]",
   "mask_token": "[MASK]",
+  "model_max_length": 8192,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",
   "tokenizer_class": "PreTrainedTokenizerFast",