Hello,
The truncation=True
parameter in camembert-large tokenizer does not seem to have any effect. When running this example:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("camembert/camembert-large")
tokenizer(["Some long piece of text", "Some other long piece of text"], padding=True, truncation=True, return_tensors="pt")
I get a warning
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
The inference thus causes an exception on long sentences because the tokenizer fails to truncate the input to 512 tokens.
Do I need to raise an issue on the Transformers repo or somewhere else?