I was following @sanchit-gandhi 's tutorial (https://huggingface.co/blog/fine-tune-whisper) but I got the following warning “The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input’s attention_mask
to obtain reliable results.”.
This warning was introduced in a recent transformers release and I was trying to understand if it can affect the finetuning process. If so, how can we avoid it?
In my knowledge, I think the cause of the warning is the eos_token_id=pad_token_id=50257
in the tokenizer, but it’s fine because we replaced the pad_token_ids
with -100
at the line labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100)
in the class DataCollatorSpeechSeq2SeqWithPadding
.
1 Like
@duonguyen that makes sense, thanks!
I ran into the same error message when running this basic HF Whisper inference codes:
https://huggingface.co/docs/transformers/en/model_doc/whisper#inference
Does anyone know how to fix it? It seems to me in Whisper’s default config: pad_token_id = 50256, eos_token_id = 50256
. How can I make them not equal?
This example for Longform transcription doesn’t work out-of-the-box either:
https://huggingface.co/docs/transformers/v4.42.0/en/model_doc/whisper#transformers.WhisperForConditionalGeneration
It runs into the same error message as above.
I added a line: kwargs["attention_mask"] = attention_mask
here:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/generation_whisper.py#L487
And it worked and got reasonable ASR results. I thought the transformers
library changed some APIs at some point and Whisper didn’t pass in the attention_mask
argument properly.
Not sure what’s the optimal solution.