Finetuned whisper model translating instead of transcribing

I have finetuned the whisper small model to my custom dataset in the Kannada language and got a good WER while training it, thanks to this great blog.
However, when I try to run inference on it using pipeline, it is translating the audio into English rather than transcribing it to Kannada. Here is a jist of what I am doing in my code:

processor = WhisperProcessor.from_pretrained("openai/whisper-small", language="Kannada", task="transcribe")
tokenizer = processor.tokenizer
feature_extractor = processor.feature_extractor
model = WhisperForConditionalGeneration.from_pretrained('<repo of my finetuned model>', use_auth_token=True)
pipe = pipeline(task = 'automatic-speech-recognition', model=model, tokenizer=tokenizer, feature_extractor=feature_extractor, device=0)
dataset = load_dataset('<my dataset>', split='test')
transcriptions = {'y':[], 'pred':[]}
def process_clip(clip):
    trans['y'].append(clip['sentence'])
    trans['pred'].append(pipe(clip['audio'])['text'])
for clip in tqdm(dataset):
  process_clip(clip)

Any help will be appreciated, thanks for taking the time!!!

Were you able to find the solution? Is it only a HF symptome or this problem is valid on other api environment as well? The problem exists with small model only?

I wasn’t able to find a solution. This issue was not occurring with the medium model. I haven’t experimented with other API environments.

1 Like