Two Whisper classes for generation but same functionalities?

Hi @alerio,

I had the same question, and it turns out that WhisperForCausalLM is the class solely used to load the assistant model for speculative decoding.

Without loading the whole encoder-decoder, WhisperForCausalLM only loads the decoder with a language modeling head on top.

You can see more details from the initial PR from Patrick: [WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding by patrickvonplaten 路 Pull Request #27195 路 huggingface/transformers 路 GitHub