[Whisper] Help me understand output_attentions & Whisper's attention mechanism options

alobko · September 9, 2024, 2:54pm

I’m trying to understand why SDPA and Flash Attention is incompatible with output_attentions.

I’m trying to improve performance of my Whisper setup, and want to try one of these attention mechanisms instead of eager, but for my application, I need word-level timestamps, which seems to only work on ‘eager’ attention?

It seems like in the code, return_token_timestamps sets output_attentions to True, is that necessary for the mechanism to output token timestamps? I couldn’t trace what that did in this case, maybe someone can help.

In WhisperSdpaAttention and WhisperFlashAttention2, it short circuits saying that ‘output_attentions’ isn’t compatible with this approach, but as far as I can tell, the ‘eager’ implementation doesn’t do anything special for output_attentions, so what makes these two incompatible with it?

I thought ‘output_attentions’ just collects the attentions and returns them, why would that interfere with these other attention mechanisms?

If anyone can help me understand this better, I’d appreciate it!

Topic		Replies	Views
Timestamps reduce Whisper hallucinations? Models	1	846	March 5, 2024
Finding Serverless Inference APIs that support attention outputs (output_attentions = true) Intermediate	0	140	March 19, 2024
Disable timestamps for Whisper Beginners	1	2635	May 26, 2024
Whisper fine-tuning and retaining timestamp decoding Models	5	1321	December 12, 2024
Finetuning whisper attention mask not set and canot be inferred 🤗Transformers	4	5462	July 20, 2024

[Whisper] Help me understand output_attentions & Whisper's attention mechanism options

Related topics