Hello, I implemented wav2vec2.0 code and a language model is not used for decoding. How can I add a language model (let’s say a language model which is trained with KenLM) for decoding @patrickvonplaten ?
thanks in advance.
Note: I also opened an issue, but redirected here.
Yeah good question - we currently don’t support evaluating with a language model, but we plan on adding this functionality soon! It’s sadly not that trivial to decode a CTC model with a language model. I’ll try to keep you posted for updates here!
Assuming that one has a kenlm model already, am I wrong to assume that’s it’s just a matter of giving the wav2vec2 output logits as argument to the ctcdecode main function, exemplified here: GitHub - parlance/ctcdecode: PyTorch CTC Decoder bindings?
Hi all, I’ve been experimenting kenlm with wav2vec2 here is the notebok
I dont know if this is a proper implementation, but it works!
I also still need to cleanup some stuff like vocab & other thing.
@jolurf you can also use this decoder (GitHub - parlance/ctcdecode: PyTorch CTC Decoder bindings). Take the labels from your tokenizer and create a n-gram language model with KenLM. After that you can feed the logits from your Wav2Vec2 model into the decoder.
@patrickvonplaten Are there any updates on the transformer language model?
Hi all! As advised by @andersgb1 I used a kenlm n-gram language model on top of a distilled wav2vec2 that I trained and it improved my WER (26 → 12.6). If you guys are interested here’s the notebook (executes seamlessly on colab) OthmaneJ/distil-wav2vec2 · Hugging Face
Is integrating an LM for wav2vec2 basically pointless now with the release of HuBERT? Which if I understand correctly is both an audio and language model at the same time? facebook/hubert-xlarge-ll60k · Hugging Face
I’m trying to achieve sub-5% (surpassing human performance) WER, but I don’t know if after I fine-tune this Hubert on my own data it will achieve that or not, because I’m not sure about the language model thing.
Does it also need an integration with a language model to actually make it perform well?