Wav2Vec2 For Swedish

The KTH division of speech music and hearing are working on making a Swedish Wav2Vec2 model.

Our first step will be to use the multi-language-model to fine-tune on Swedish voices / transcriptions to make a Swedish Speech to Text model.

You are welcome to help out, this thread might also be useful for people working on similar tasks in another language.



I’ve added most of :hugs:'s Wav2Vec2 code and I’m more than happy to help at fine-tuning the multi-language checkpoint for Swedish. So feel free to tag me in this thread for any questions you might have.

Also, I’m planning on releasing an in-detail notebook about fine-tuning Wav2Vec2 in a couple of days, which I’ll link here.


hi @patrickvonplaten I too am also interested in fine-tuning wav2vec2ctc and would be interested in the notebook. thank you!


Checking on when this notebook would be ready for us mere mortals. :slight_smile:

The notebook/blog was released last week

Fine-Tune Wav2Vec2-base on TIMIT Fine-Tune Wav2Vec2 for English ASR in Hugging Face with 🤗 Transformers
Fine-Tune XLSR-Wav2Vec2-large on Common Voice
Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers


@patrickvonplaten @valhalla thanks!
This is very good news!
It’s a simple step into speech recognition.

This event: [Open-to-the-community] XLSR-Wav2Vec2 Fine-Tuning Week for Low-Resource Languages might be interesting for you as well!