Boosting Wav2Vec2-xls-r with an N gram decoder using the transcripts used to train wav2vec2

Rakib · July 8, 2022, 5:58pm

Hi,
I would be really thankful if you could answer the following queries. I have been finetuning wav2vec2 and then boosting the performance with an n-gram decoder.

If the wav2vec2 model is trained on DatasetA, then will it be wise to train an n-gram lm with the transcripts of DatasetA to boost the performance of wav2vec2 model? I have tried this and it reduces WER and CER significantly as expected. However, will the model be less generalized or perform worse in some unseen data? OR, is it better to train the n-gram lm with text data similar to DatasetA but not exactly DatasetA?
What are some preprocessing and post processing modules that I can use to improve the performance of wav2vec2 model? Can you please point me to some resources?
Will I get any benefit if I create my own tokenizer for my own dataset? Will it increase the performance of my model? Can I finetune pretrained models using this tokenizer or should I have to train from scratch?

@patrickvonplaten

patrickvonplaten · July 26, 2022, 3:40pm

Hey @Rakib,

To answer 1.)
That depends on your use case. If you just want to have the best, “fair” model evaluated on DatasetA “test” then I would train your LM on DatasetA “train”. If you want the most general model then I’d try to use as much diverse language data as possible. As a reference maybe this blog might help: Boosting Wav2Vec2 with n-grams in 🤗 Transformers

In general though it’s not unusual to see such improvements and as long as you don’t use the test transcripts in your LM training data, it should be fine!

You could use spelling corrector such as oliverguhr/spelling-correction-english-base · Hugging Face as postprocessors amongst other and noise cancel filters as pre-processors
No I wouldn’t recommend creating your own tokenizer. Instead I’d just create a character based look up table fro Wav2Vec2 as described here: Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers

Topic		Replies	Views
Boosting-wav2vec-*with-n-gram.ipynb Beginners	0	312	January 29, 2022
Trouble in boosting ASR performance by adding LM Models	0	332	May 5, 2023
Train and inference wav2vec2 using a language model Intermediate	1	683	May 2, 2021
Wav2vec2-base task performance Models	4	898	May 8, 2023
Swedish ASR: Fine Tuning Wav2Vec2 Models	4	867	March 23, 2021

Boosting Wav2Vec2-xls-r with an N gram decoder using the transcripts used to train wav2vec2

Related topics