Boosting Wav2Vec2-xls-r with an N gram decoder using the transcripts used to train wav2vec2

patrickvonplaten · July 26, 2022, 3:40pm

To answer 1.)
That depends on your use case. If you just want to have the best, “fair” model evaluated on DatasetA “test” then I would train your LM on DatasetA “train”. If you want the most general model then I’d try to use as much diverse language data as possible. As a reference maybe this blog might help: Boosting Wav2Vec2 with n-grams in 🤗 Transformers

In general though it’s not unusual to see such improvements and as long as you don’t use the test transcripts in your LM training data, it should be fine!

You could use spelling corrector such as oliverguhr/spelling-correction-english-base · Hugging Face as postprocessors amongst other and noise cancel filters as pre-processors
No I wouldn’t recommend creating your own tokenizer. Instead I’d just create a character based look up table fro Wav2Vec2 as described here: Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers

Topic		Replies	Views
Trouble in boosting ASR performance by adding LM Models	0	324	May 5, 2023
Improving performance of Wav2Vec2 fine tuning with word piece vocabulary Research	5	2961	October 27, 2021
Wav2vec2-base task performance Models	4	879	May 8, 2023
Finetunig of wav2vec2-xls-r-300m outputs invalid words for Bengali data Models	6	682	February 1, 2023
Ideas to correct Wav2Vec2 transcription results Beginners	1	998	May 11, 2021

Boosting Wav2Vec2-xls-r with an N gram decoder using the transcripts used to train wav2vec2

Related topics