Is it possible to use WhisperModel for an audio classification task?

akt42 · November 18, 2022, 6:04pm

I wonder if it is possible to use the WhisperModel for transfer learning for a speech classification task.

If possible, I would like to know how to connect a classification head to the outputs of the model.

For example, the following output has four keys:

from transformers import AutoModel

model = AutoModel.from_pretrained("openai/whisper-small")
output = model(
    torch.tensor(features['input_features']),  
    decoder_input_ids=torch.tensor([[1, 1]]) * model.config.decoder_start_token_id,
    output_hidden_states=True
)

# odict_keys(['last_hidden_state', 'past_key_values', 'decoder_hidden_states', 'encoder_last_hidden_state', 'encoder_hidden_states'])

If I can connect a classification head, to which output should I connect one?
Also, what are these outputs and how do they relate to the blocks in the diagram in the paper?

Thanks!

Topic		Replies	Views
What is the right way of developing my own model based on a pretrained transformer? 🤗Transformers	1	503	October 2, 2023
Problems tracing fine tuned whisper model to torchscript Beginners	1	395	June 27, 2024
How to use Whisper from huggingface for ASR DeepSpeed	0	538	June 21, 2023
Fine-tuning Whisper for Audio Classification Models	6	3244	November 8, 2024
Is prompt properly implemented in the whisper model? 🤗Transformers	1	1558	September 19, 2024

Is it possible to use WhisperModel for an audio classification task?

Related topics