NohTow commited on
Commit
9e1cd05
1 Parent(s): 6e46162

Set tokenizer "model_max_length" property to 8192

Browse files

Somehow composer exported `model_max_length` tokenizer property to a very huge value instead of 8192.
This breaks the `tokenizer.model_max_length` call that some pipelines rely on.

As we corrected max_pos_embeddings, I suggest we also fix this for consistency, although this is not an hard limit.
See [this issue](https://github.com/AnswerDotAI/ModernBERT/issues/166) for more information.

Files changed (1) hide show
  1. tokenizer_config.json +1 -1
tokenizer_config.json CHANGED
@@ -932,7 +932,7 @@
932
  "clean_up_tokenization_spaces": true,
933
  "cls_token": "[CLS]",
934
  "mask_token": "[MASK]",
935
- "model_max_length": 1000000000000000019884624838656,
936
  "pad_token": "[PAD]",
937
  "sep_token": "[SEP]",
938
  "tokenizer_class": "PreTrainedTokenizerFast",
 
932
  "clean_up_tokenization_spaces": true,
933
  "cls_token": "[CLS]",
934
  "mask_token": "[MASK]",
935
+ "model_max_length": 8192,
936
  "pad_token": "[PAD]",
937
  "sep_token": "[SEP]",
938
  "tokenizer_class": "PreTrainedTokenizerFast",