Indonesian ASR: Fine-Tuning Wav2Vec2

ayameRushia · March 21, 2021, 9:00am

I’m in, it could be easier to achieve lower WER if we combine our effort, but how do we define what works should someone do?

Wikidepia · March 21, 2021, 9:12am

This is some sample from YouTube Closed Captions : http://148.251.140.232/youtube-asr/

Wikidepia · March 21, 2021, 9:13am

We can create like telegram group maybe

Galuh · March 21, 2021, 9:33am

I have also requested dataset from TITML-IDN - Speech Resources Consortium yesterday. Will update if I hear back.

If everyone uses Telegram I can create one - feel free to send me a message with your username and I’ll add you. Maybe we can divide our work there. I was thinking some of us can focus on modifying a certain hyperparam (e.g. epoch/learning rate etc.) with other hyperparams being fixed, while the others focus on integrating certain datasets (when available). Then we can combine the best results of our experiments for the final model. What do you all think?

cahya · March 21, 2021, 10:38am

It looks nice. But many voices are longer than the text in json file. For example the sound file http://148.251.140.232/youtube-asr/056yb0seQWc/10517504935501761652288671477516212840.mp3, the person said: “karena di sekolah lamanya dia kedapatan merampok dan hampir membunuh seorang de…”, where the json file has only this text “karena di sekolah lamanya dia kedapatan”

cahya · March 21, 2021, 10:41am

my telegram name is @CahyaWirawan

ayameRushia · March 21, 2021, 11:29am

My telegram is @magungh1

Wikidepia · March 21, 2021, 12:17pm

My telegram username is @Wikidepia

cahya · March 21, 2021, 7:50pm

Hi @munggok, I see you already started with wav2vec few days ago, do you want to join our discussion here?

munggok · March 21, 2021, 10:51pm

hi @cahya ,sure. count me in
would love to join

munggok · March 21, 2021, 11:05pm

i manage to fine tuned xlsr-indonesia week ago and got 0.401 eval wer
the log : (trainer_state.json · munggok/xlsr_indonesia at main)

basically i use same step as the tutorial for fine tuning xlsr by patrick but train it bit longer (change the epoch to 60) and use larger batch size (32)

as for improvement,i agree ,we need additional dataset outside common voice perhaps

putting the link indonesia dataset speech that hasn’t been posted here(CMIIW)

cahya · March 22, 2021, 5:37am

Great, please join also our telegram channel where we chat more and the list where we split our tasks Fine-Tune XLSR-Wav2Vec2 on Indonesian ASR with 🤗 Transformers - Checklist - Google Sheets

Galuh · March 22, 2021, 5:49am

What’s your Telegram ID? I’ll add you to the group

munggok · March 22, 2021, 5:55am

Okay

My telegram ID @acul3

yasirabd · March 25, 2021, 4:52pm

Hi everyone, sorry I am too late to join the discussion.
I am new to this topic, hopefully, can learn a lot from this.

I’ve read the fine-tuning checklist on google spreadsheet. I’d like to contribute to combine Dataset common voice with shaip.

Please add me to the Telegram group @yasirabdr.

zaraaiw · March 1, 2023, 2:44am

this blog is very useful and relevan with article i’ve read, for more detail you can visit https://devel.pusatbahasa.unair.ac.id/V1/sejarah-singkat/

Topic		Replies	Views
Hindi ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	19	3009	January 4, 2022
Thai ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	0	1022	March 18, 2021
Swedish ASR: Fine Tuning Wav2Vec2 Models	4	865	March 23, 2021
German ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	17	3681	February 18, 2022
How much fire power are we expected to have in order to fine tune the W2V2 XLSR model? 🤗Transformers	4	879	March 27, 2021

Indonesian ASR: Fine-Tuning Wav2Vec2

Related topics