I’m trying to learn how I can apply SpecAugment when fine tuning a Whisper model. First of all, I’m not sure if applying such augmentation is common practice for fine tuning or not (maybe it’s only used for training?). And second, I don’t even know how to implement it.
I see that PyTorch has a library implementing the technique. But I guess I need to take into account the mask for each sample before using that library. And I don’t know how to do that.
I appreciate any guidance I can get. Thanks.
HI, I share the same question as well. Have you managed to sovle this issue?
I’m not if this is the right way to do it but it seem to work:
from transformers import WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.config.forced_decoder_ids = None
model.config.suppress_tokens = 
model.config.apply_spec_augment = True
model.config.mask_time_prob = 0.05
model.config.mask_feature_prob = 0.05
The last three lines enable the SpecAug for the Whisper model. The right values to set need to be researched.