How to apply SpecAugment to a Whisper?

I’m trying to learn how I can apply SpecAugment when fine tuning a Whisper model. First of all, I’m not sure if applying such augmentation is common practice for fine tuning or not (maybe it’s only used for training?). And second, I don’t even know how to implement it.

I see that PyTorch has a library implementing the technique. But I guess I need to take into account the mask for each sample before using that library. And I don’t know how to do that.

I appreciate any guidance I can get. Thanks.

HI, I share the same question as well. Have you managed to sovle this issue?

I’m not if this is the right way to do it but it seem to work:

from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")

model.config.forced_decoder_ids = None
model.config.suppress_tokens = []

model.config.apply_spec_augment = True
model.config.mask_time_prob = 0.05
model.config.mask_feature_prob = 0.05

The last three lines enable the SpecAug for the Whisper model. The right values to set need to be researched.