Language model for wav2vec2.0 decoding

EmreOzkose · March 16, 2021, 6:02am

Hello, I implemented wav2vec2.0 code and a language model is not used for decoding. How can I add a language model (let’s say a language model which is trained with KenLM) for decoding @patrickvonplaten ?

thanks in advance.

Note: I also opened an issue, but redirected here.

patrickvonplaten · March 16, 2021, 2:31pm

Hey Emre!

Yeah good question - we currently don’t support evaluating with a language model, but we plan on adding this functionality soon! It’s sadly not that trivial to decode a CTC model with a language model. I’ll try to keep you posted for updates here!

andersgb1 · April 17, 2021, 1:03pm

Assuming that one has a kenlm model already, am I wrong to assume that’s it’s just a matter of giving the wav2vec2 output logits as argument to the ctcdecode main function, exemplified here: GitHub - parlance/ctcdecode: PyTorch CTC Decoder bindings?

Or is there more to it than that?

sandersbud4 · April 19, 2021, 7:56am

@EmreOzkose Good question i think. But i don’t start this as professional level. I’m currently searching on this.

Wikidepia · April 22, 2021, 10:41pm

Hi all, I’ve been experimenting kenlm with wav2vec2 here is the notebok
I dont know if this is a proper implementation, but it works!
I also still need to cleanup some stuff like vocab & other thing.

SamuelAzran · April 23, 2021, 7:39am

@Wikidepia Can you share how much it improved your WER score? Also, did you tried character level LM as well?

Wikidepia · April 24, 2021, 8:43am

It improved from 14.2 to 9.2. I haven’t tried character level LM

Edresson · April 24, 2021, 4:54pm

I added support for KenLM using the flashlight library here: Wav2Vec-Wrapper/test.py at main · Edresson/Wav2Vec-Wrapper · GitHub

It supports the use of the binary file instead of the arpa and it is also possible to restrict the model’s vocabulary.

EmreOzkose · April 26, 2021, 5:40am

Thank you @Wikidepia and @Edresson. I will check out.

Beau · May 10, 2021, 9:44pm

Hi Patrick!

Any news on language model evaluation support?

patrickvonplaten · May 16, 2021, 6:39pm

See Added Feature: Prefix decoding for wav2vec2 models by deepang17 · Pull Request #11606 · huggingface/transformers · GitHub

jolurf · May 27, 2021, 8:19pm

Wiki, when I apply your code, it predicts only spaces and -. Is there any reason for it?

ChristophBensch · June 15, 2021, 10:10pm

@jolurf you can also use this decoder (GitHub - parlance/ctcdecode: PyTorch CTC Decoder bindings). Take the labels from your tokenizer and create a n-gram language model with KenLM. After that you can feed the logits from your Wav2Vec2 model into the decoder.

@patrickvonplaten Are there any updates on the transformer language model?

Voidful suggested to combine the wav2vec2 probabilities with those of the gpt2 model:
( huggingface_notebook/xlsr_gpt.ipynb at main · voidful/huggingface_notebook · GitHub )

However, the CTC Vocab seems to match the GPT Vocab. Unfortunately, this is not the case in English. Is there already a solution?

DewiBrynJones · June 17, 2021, 2:09pm

If this discussion is still ongoing, then there is a pull request Added Feature: Prefix decoding for wav2vec2 models by deepang17 · Pull Request #11606 · huggingface/transformers · GitHub currently open, and as @ChristophBensch mentions a means of using KenLM from GitHub - parlance/ctcdecode: PyTorch CTC Decoder bindings. We have an example of this at GitHub - techiaith/docker-wav2vec2-xlsr-ft-cy: Hyfforddi modelau adnabod lleferydd Cymraeg wav2vec2 a KenLM a'u darparu drwy weinydd gwasanaeth API // Train wav2vec2 and KenLM models for Welsh language speech recognition and/or provide via a simple API server. that’s reduced our WER score for Welsh from 25% to 15%. Since our scripts use HuggingFace’s OSCAR dataset, they should be easily adaptable to train and optimize LMs for other lesser resourced languages as well.

othiele · June 24, 2021, 8:11am

Thanks @DewiBrynJones for the implementation, love the idea to have readmes in your local language and reference an English version

OthmaneJ · July 1, 2021, 8:09am

Hi all! As advised by @andersgb1 I used a kenlm n-gram language model on top of a distilled wav2vec2 that I trained and it improved my WER (26 → 12.6). If you guys are interested here’s the notebook (executes seamlessly on colab) OthmaneJ/distil-wav2vec2 · Hugging Face

agemagician · July 5, 2021, 2:58pm

@OthmaneJ
Could you please share the code you used for distilling wav2vec2?

junxtjx · July 5, 2021, 3:23pm

So to use the wav2vec2 with gpt2 for English, would we have to just match the vocab used in the wav2vec2 with the vocab used in the gpt2?

youssefav · July 7, 2021, 10:23am

Is integrating an LM for wav2vec2 basically pointless now with the release of HuBERT? Which if I understand correctly is both an audio and language model at the same time? facebook/hubert-xlarge-ll60k · Hugging Face

I’m trying to achieve sub-5% (surpassing human performance) WER, but I don’t know if after I fine-tune this Hubert on my own data it will achieve that or not, because I’m not sure about the language model thing.

Does it also need an integration with a language model to actually make it perform well?

ThomasG · September 13, 2021, 7:21pm

Can I create a character-level LM with KenLM?

Topic		Replies	Views
Train and inference wav2vec2 using a language model Intermediate	1	681	May 2, 2021
Wav2vec: how to run decoding with a language model? Beginners	6	6413	August 24, 2022
How to create Wav2Vec2 With Language model 🤗Transformers	15	5970	May 5, 2023
Confidence Scores / Self-Training for Wav2Vec2 / CTC models With LM (PyCTCDecode) Research	1	2898	April 21, 2022
Saved models do not work after being loaded Beginners	1	711	August 17, 2021

Language model for wav2vec2.0 decoding

Related topics