Language model for wav2vec2.0 decoding

farisalasmary · September 14, 2021, 10:58pm

@patrickvonplaten @Beau
Hi guys, I had also implemented a simple KenLM with beam search decoding for Wav2Vec2CTC using: GitHub - parlance/ctcdecode: PyTorch CTC Decoder bindings

You may find it useful

Here is the repo:

dzorlu · September 24, 2021, 7:00pm

This was very helpful. Thanks for posting it.

gorodecki · October 17, 2021, 1:59pm

Could you please share the code you used for distilling wav2vec2?

patrickvonplaten · November 4, 2021, 5:25pm

Hey guys,

I’ve done some benchmarking with the pyctcdecode library and I think works quite well actually in combination with transformers.

Here is a repo where you can find some comparisons between Wav2Vec2 + LM vs. Wav2Vec2 + no LM as well as all the necessary scripts to run the eval: GitHub - patrickvonplaten/Wav2Vec2_PyCTCDecode: Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

patrickvonplaten · January 12, 2022, 9:22pm

We now have an in-detail blog post explaining step-by-step how to create an n-gram language model and how to integrate it with Transformers and pctcdecode here:

sanders · February 12, 2022, 2:29am

@patrickvonplaten - thanks for that. I have a wav2vec2 model and a binary kenlm language model, both which i build without using huggingface. I am interested in porting my model to huggingface. Is this currently possible or not yet?

carrotpie · April 14, 2022, 12:04pm

Sorry for resurrecting this, but seems like a right place to ask - has anyone tried CTCDecoding with other than KenLM models? Is it too slow and are there any public attempts or examples on that? Sorry if that’s a stupid question, I realize that n-grams are much faster, but perhaps in some use cases (like mine) precision is more important than speed and it just seems that models like GPT-Neo could achieve much greater precision than an N-gram.

jrf · June 16, 2022, 4:38pm

Patrick, I am preparing to use Wav2Vec2 with the language model you describe here - for my solution I particularly like pyctcdecode’s “hotwords” function. I noticed, however, that Kenlm is destributed under the lesser gnu public license, which is much less permissive than the other licenses in the chain in terms of commercial use. Do you happen to have any intuitions about whether use of .arpa files produced by Kenlm and then used by pyctcdecode/Wav2Vec2 forces inheritance of the LGPL? Thanks!

SaraSadeghi · June 22, 2022, 1:55pm

Hi everyone!

I tried to use 3-gram language model that has been trained using the kaldi-asr toolkit to make Wav2Vec2ProcessorWithLM instead of using kenlm-based LMs, but I received error below:

OSError: Cannot read model ...
(lm/read_arpa.cc:99 in void lm::ReadBackoff(util::FilePiece&, lm::Prob&) threw FormatLoadException.  
Non-zero backoff -1.113 provided for an n-gram that should have no backoff in the 3-gram at byte 4082800 Byte: 4082800)

Is it a good idea to use kaldi-based LM instead of using kenlm? (both have .arpa format) @patrickvonplaten
thanks for your attention

zoha · June 24, 2022, 4:43pm

hey carrotpie, hope you’re doing fine
have you found a solution to your question here?
is it possible to wrap a wav2vec2 model with an LM other than KenLM?
have you gotten any experience?

zoha · June 24, 2022, 4:44pm

hey sara
have you found a solution to your question here?

SaraSadeghi · June 25, 2022, 11:56am

No, not yet , I’m waiting!

carrotpie · June 27, 2022, 9:47am

Hey, I have not tried to do it myself, I don’t think I would be skillful enough for that kind of task. The closest thing I found was a feature called “neural rescoring” within NVIDIA’s NeMo framework: GitHub - NVIDIA/NeMo: NeMo: a toolkit for conversational AI
Perhaps someone could hack around with code from Nemo and port it to Transformers, or at least get inspiration from it.

zoha · July 4, 2022, 1:15pm

ah ok thank you

ishansharma1320 · July 26, 2022, 8:16am

I think, we can directly use this script in NVIDIA’s Nemo toolkit along with hugging face transformers
https://github.com/NVIDIA/NeMo/blob/main/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py

patrickvonplaten · July 26, 2022, 1:44pm

Hey @SaraSadeghi,

If you’re not able to load the language model with the kenLM library: GitHub - kpu/kenlm: KenLM: Faster and Smaller Language Model Queries then it’ll be difficult to get it working with Transformers. Could you first check if you can load it with kenLM?

kavyamanohar · August 3, 2024, 9:05am

Is there any documentation on using BPE language model, instead of a word-level one?

Topic		Replies	Views
Train and inference wav2vec2 using a language model Intermediate	1	681	May 2, 2021
Wav2vec: how to run decoding with a language model? Beginners	6	6413	August 24, 2022
How to create Wav2Vec2 With Language model 🤗Transformers	15	5969	May 5, 2023
Confidence Scores / Self-Training for Wav2Vec2 / CTC models With LM (PyCTCDecode) Research	1	2897	April 21, 2022
Saved models do not work after being loaded Beginners	1	711	August 17, 2021

Language model for wav2vec2.0 decoding

Related topics