Wav2Vec2 doubts

alvoc · January 24, 2023, 1:24pm

Hello, i have some questions about wav2vec2.

I have a finetuned model without LM and also one with LM. And even the one with LM keep returning words out of the wordlist. I’ve read that beam search decoder doesn’t avoid the model to return an invented word that do not exist in the wordlist, and the LM just helps in the repunctuation process and misspellings. I’ve seen a way to force the valid output words to the ones from a lexicon but also doesn’t work well.

First question is about how this model proceeds with oovs words in the decoding process, if needs at least some aparitions of this word in train to learn the speech representations of this word so the model transcripted word got sense and and the LM helps in this case. If this word wasn’t in train the LM can do nothing and don’t help in this cases cause model doesn’t “know” this word??

My second doubt i think is also related. When you fine tune the model with specific domain words are you making this model “good” only in this context? So that’s why a test set with out of train words gives you worst results than test from same context?

Thank you

Topic		Replies	Views
Wav2vec2 finetuning and language model Beginners	0	213	October 1, 2023
Train and inference wav2vec2 using a language model Intermediate	1	681	May 2, 2021
Further train a fine tuned wav2vec model 🤗Transformers	2	531	September 25, 2022
Wav2vec: how to run decoding with a language model? Beginners	6	6413	August 24, 2022
How to decode wav2vec2 output with beam search? Beginners	0	574	March 6, 2023

Wav2Vec2 doubts

Related topics