PyTorch version

david-waterworth · December 20, 2020, 11:02pm

One thing to note is tokenizer.model_max_length doesn’t work - it’s always set to int(1e30). The reason is tokenization_utils_base.py#L1857 only sets model_max_length for the hard coded list of pre-trained huggingface tokenizers. If you train your own I don’t see any way of setting it from config when using AutoTokenizer().from_pretrained()

Topic		Replies	Views
Positional encoding error in RoBERTa 🤗Transformers	1	338	October 2, 2023
Positional Encoding error, Protein Bert Model Intermediate	2	654	October 25, 2020
Claritifcation about the `max_position_embeddings` argument 🤗Transformers	1	504	January 27, 2023
Error using `max_length` in transformers 🤗Transformers	3	2705	February 26, 2021
Different size of Roberta-base tokenizer and model embedding Beginners	1	1134	March 1, 2022

PyTorch version

Related topics