Token indices sequence length is longer than the specified maximum sequence length

antoine2323231 · October 28, 2022, 1:11pm

Hi, when running the run_t5_mlm_flax.py script I am getting this error:

Token indices sequence length is longer than the specified maximum sequence length for this model (523 > 512). Running this sequence through the model will result in indexing errors.

I have specified model_max_length =512 within the tokenizer.
And passed --max_seq_length=“512” \ to the run_t5_mlm_flax.py script.

Unfortunately I still get the same warning.

lianghsun · October 28, 2022, 11:11pm

Hi @antoine2323231 , can you try the following code to see if it works?

tokenizer(batch_sentences, padding='max_length', truncation=True)

antoine2323231 · October 30, 2022, 7:43am

Hey lianghsun, I tried that but getting the same result. It is strange…

bobbytonylowe-rgdl · December 19, 2022, 5:21pm

I am also getting a similar error. Did you resolve this?

tq2000 · February 15, 2023, 10:41pm

My walk-around is to reduce the length of the prompt. For example, if you are doing question answering, the context + question should be short than 512 tokens.

Topic		Replies	Views
Possibly incorrect sequence length warning for sequences greater than model_max_length 🤗Transformers	0	1377	April 18, 2022
Truncating sequence -- within a pipeline Beginners	7	5799	May 3, 2024
Predictions with pipeline fails to truncate test set 🤗Transformers	0	180	January 23, 2024
Token indices sequence length is longer than the specified maximum sequence length for this model 🤗Transformers	1	5427	July 21, 2023
How does huggingface T5 flax pretraining script handles very long sentences? 🤗Transformers	0	365	May 4, 2022

Token indices sequence length is longer than the specified maximum sequence length

Related topics