Fine-tuning whsiper on custom special tokens

Hey, @sanchit-gandhi it was amazing to read your blog-post on whsiper and how to fine-tune it. I had an question, if it was possible to fine-tune whisper using special tokens?

For example: I know Whipser was trained with tokens to address certain parts of the speech and then do prediction as such. and it is possible to augment the embedding layer and tweak to train whisper with more custom tokens with a modified dataset.

But, I specifically wanted to know if it is indeed possible to fine-tune whisper with these special tokens for example || etc. If you had any advice or suggestions to share, it’ll mean a lot.

I couldn’t find a lot of information on the internet and OpenAI discussions as well except for a few hints. If you could share some advice it will help to train it for my specific need.

1 Like