I would like to play around with the code for WhisperForAudioClassification
. I believe I can improve it and to prove that, I need to write my own code.
But of course, I don’t have the resources to train the model from scratch. Looking at this repo, this is how to load the classification model for fine-tuning:
model = AutoModelForAudioClassification.from_pretrained(
model_args.model_name_or_path, // e.g. "openai/whisper-medium"
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
)
...
The main part is AutoModelForAudioClassification
which loads a model of WhisperForAudioClassification
(code and weights). This is all good but if I fine tune the model this way, I’ll be doing that based on the current code. And I want to introduce my own code.
Basically, what I would like to learn is how to load the “openai/whisper-medium” weights into my own class. Of course, the weights and the class should be compatible. To expand on that, the WhisperForAudioClassification
class adds a head to the encoder part of the Whisper. And I want to code a new/different head while keep using the same encoder. Needless to say, the pretrained weights coming from “openai/whisper-medium” will only populate the encoder and not the head.
Can some one please help learn how to use my own code populated with pretrained weights?