One thing to note is tokenizer.model_max_length
doesn’t work - it’s always set to int(1e30). The reason is tokenization_utils_base.py#L1857 only sets model_max_length for the hard coded list of pre-trained huggingface tokenizers. If you train your own I don’t see any way of setting it from config when using AutoTokenizer().from_pretrained()