Finetuning whisper attention mask not set and canot be inferred

ruimaia · July 15, 2024, 5:03pm

I was following @sanchit-gandhi 's tutorial (https://huggingface.co/blog/fine-tune-whisper) but I got the following warning “The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input’s attention_mask to obtain reliable results.”.

This warning was introduced in a recent transformers release and I was trying to understand if it can affect the finetuning process. If so, how can we avoid it?

duonguyen · July 17, 2024, 8:06am

In my knowledge, I think the cause of the warning is the eos_token_id=pad_token_id=50257 in the tokenizer, but it’s fine because we replaced the pad_token_ids with -100 at the line labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100) in the class DataCollatorSpeechSeq2SeqWithPadding.

ruimaia · July 17, 2024, 8:32am

@duonguyen that makes sense, thanks!

huangruizhe · July 20, 2024, 6:52pm

I ran into the same error message when running this basic HF Whisper inference codes:
https://huggingface.co/docs/transformers/en/model_doc/whisper#inference

Does anyone know how to fix it? It seems to me in Whisper’s default config: pad_token_id = 50256, eos_token_id = 50256. How can I make them not equal?

huangruizhe · July 20, 2024, 9:55pm

This example for Longform transcription doesn’t work out-of-the-box either:
https://huggingface.co/docs/transformers/v4.42.0/en/model_doc/whisper#transformers.WhisperForConditionalGeneration
It runs into the same error message as above.

I added a line: kwargs["attention_mask"] = attention_mask here:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/generation_whisper.py#L487
And it worked and got reasonable ASR results. I thought the transformers library changed some APIs at some point and Whisper didn’t pass in the attention_mask argument properly.

Not sure what’s the optimal solution.

Topic		Replies	Views
"attention_mask" + `pad_token_id 🤗Transformers	2	5266	June 6, 2024
Finetuning Whisper with prompts 🤗Transformers	3	4072	January 16, 2024
Do automatically generated attention masks ignore padding? 🤗Transformers	4	16437	March 8, 2022
Can attention_mask hold float values in [0,1] in T5? How these masks act in Attention blocks? 🤗Transformers	0	690	May 26, 2022
Forward() got an unexpected keyword argument 'attention_mask' in Whisper Tutorial 🤗Transformers	1	4388	June 2, 2023

Finetuning whisper attention mask not set and canot be inferred

Related topics