I would be really thankful if you could answer the following queries. I have been finetuning wav2vec2 and then boosting the performance with an n-gram decoder.
- If the wav2vec2 model is trained on DatasetA, then will it be wise to train an n-gram lm with the transcripts of DatasetA to boost the performance of wav2vec2 model? I have tried this and it reduces WER and CER significantly as expected. However, will the model be less generalized or perform worse in some unseen data? OR, is it better to train the n-gram lm with text data similar to DatasetA but not exactly DatasetA?
- What are some preprocessing and post processing modules that I can use to improve the performance of wav2vec2 model? Can you please point me to some resources?
- Will I get any benefit if I create my own tokenizer for my own dataset? Will it increase the performance of my model? Can I finetune pretrained models using this tokenizer or should I have to train from scratch?