Are there any differences between WhisperForConditionalGeneration and WhisperForCausalLM? From the documentation, they are very similar to each other.
For WhisperForConditionalGeneration, it says:
The Whisper Model with a language modeling head. Can be used for automatic speech recognition. This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
And for WhisperForCausalLM:
Whisper decoder with a language modeling head on top (linear layer with weights tied to the input embeddings). This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
Looks like both of them have a language modeling head on the top. But are there any other differences for these classes?
Best