Wav2vec: how to run decoding with a language model?

Su-Youn · May 7, 2021, 10:51am

Hello.
I am finetuning wav2vec “wav2vec2-large-lv60
“ using my own dataset. I followed Patrick’s tutorial (Fine-Tune Wav2Vec2 for English ASR in Hugging Face with 🤗 Transformers) and successfully finished the finetuning (thanks for very nice tutorial.)

Now, I would like to run decoding with a language model and have a few questions.

Can we run decoding with a language model directly from huggingface?
If not, how can I get the wave2vec model compatible to the fairseq decoding script (fairseq/examples/speech_recognition/infer.py)?

I did the following steps, but it failed:

Create ‘.pt’ file from the finetuning checkpoint
def save_model(my_checkpoint_path):
model = Wav2Vec2ForCTC.from_pretrained(my_checkpoint_path)
torch.save(model.state_dict(), my_model.pt)
Decoding
I used the decoding step command from the following webpage https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md#evaluating-a-ctc-model
$subset=dev_other
python examples/speech_recognition/infer.py /checkpoint/abaevski/data/speech/libri/10h/wav2vec/raw --task audio_pretraining
–nbest 1 --path /path/to/model --gen-subset $subset --results-path /path/to/save/results/for/sclite --w2l-decoder kenlm
–lm-model /path/to/kenlm.bin --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000
–post-process letter
I replaced /path/to/model with “my_model.pt”.

Then, I am getting the following error message.
Traceback (most recent call last):
File “/mount/fairseq/examples/speech_recognition/infer.py”, line 427, in
cli_main()
File “/mount/fairseq/examples/speech_recognition/infer.py”, line 423, in cli_main
main(args)
File “/mount/fairseq/examples/speech_recognition/infer.py”, line 229, in main
models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task(
File “/mount/fairseq/fairseq/checkpoint_utils.py”, line 370, in load_model_ensemble_and_task
state = load_checkpoint_to_cpu(filename, arg_overrides)
File “/mount/fairseq/fairseq/checkpoint_utils.py”, line 304, in load_checkpoint_to_cpu
state = _upgrade_state_dict(state)
File “/mount/fairseq/fairseq/checkpoint_utils.py”, line 456, in _upgrade_state_dict
{“criterion_name”: “CrossEntropyCriterion”, “best_loss”: state[“best_loss”]}
KeyError: ‘best_loss’

When I googled it, this seems relevant to removal of the optimization history logs:

This happens because we remove the useless optimization history logs from the model to reduce the file size. Only the desired model weights are kept to release. As a result, if you directly load the model, error will be reported that some logs are missed.

So how can I save the finetuning model compatible to “fairseq”. Should I store the optimization history? If yes, how can I do it? Does anyone have same experience? If yes, could you please share it with me? Thank you always.

Su-Youn · May 8, 2021, 2:17am

Oh, I found the following previous discussion from the forum. Sorry for missing this one.

so i will check them out first. Thanks.

qpazuzu · June 5, 2021, 6:52pm

Hi @Su-Youn,

Did you manage to add LM decoding?

Su-Youn · June 6, 2021, 7:32am

Yes. I used the following code with some updates.

qpazuzu · June 6, 2021, 5:08pm

Thank you! I will give this a try

patrickvonplaten · November 4, 2021, 5:28pm

This could also help: GitHub - patrickvonplaten/Wav2Vec2_PyCTCDecode: Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

nihalbaig · August 24, 2022, 8:29am

Hi @patrickvonplaten , I have trained a model using wave2vec2 and tried to use n-gram with it. But there are errors like input shape is not matching. Also my model didn’t generate any alphabet.json file. I have tried your blog Boosting Wav2Vec2 with n-grams in 🤗 Transformers . But it didn’t worked on my model. Though it worked on some other models. Could you guide me what’s wrong with my model training? Here is my mode: nihalbaig/wav2vec2-large-xlsr-bn · Hugging Face .

Topic		Replies	Views
Language model for wav2vec2.0 decoding Models	36	13917	August 3, 2024
Train and inference wav2vec2 using a language model Intermediate	1	681	May 2, 2021
Wav2vec2 inference on my own model Beginners	0	374	November 18, 2021
How to convert wav2vec2 checkpoint to Huggingface processor and model? 🤗Transformers	1	573	July 25, 2021
Wav2Vec2ForCTC.from_pretrained for already trained Models? Beginners	1	2057	May 17, 2021

Wav2vec: how to run decoding with a language model?

Related topics