No maximum length is provided with camembert-large

bilelomrani · February 3, 2022, 2:15pm

Hello,

The truncation=True parameter in camembert-large tokenizer does not seem to have any effect. When running this example:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("camembert/camembert-large")
tokenizer(["Some long piece of text", "Some other long piece of text"], padding=True, truncation=True, return_tensors="pt")

I get a warning

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

The inference thus causes an exception on long sentences because the tokenizer fails to truncate the input to 512 tokens.

Do I need to raise an issue on the Transformers repo or somewhere else?

Topic		Replies	Views
How to truncate from the head in AutoTokenizer? 🤗Tokenizers	2	4654	September 26, 2020
Tokenizer behaviour with pipeline 🤗Tokenizers	0	919	August 1, 2023
Token indices sequence length is longer than the specified maximum sequence length for this model 🤗Transformers	1	5362	July 21, 2023
[Tokenizers]What this max_length number? Beginners	3	2454	March 3, 2025
Limit max # of tokens for inference in pipeline? Beginners	0	1078	April 7, 2023

No maximum length is provided with camembert-large

Related topics