Wav2vec: how to run decoding with a language model?

Hello.
I am finetuning wav2vec “wav2vec2-large-lv60
“ using my own dataset. I followed Patrick’s tutorial (Fine-Tune Wav2Vec2 for English ASR in Hugging Face with 🤗 Transformers) and successfully finished the finetuning (thanks for very nice tutorial.)

Now, I would like to run decoding with a language model and have a few questions.

  1. Can we run decoding with a language model directly from huggingface?
  2. If not, how can I get the wave2vec model compatible to the fairseq decoding script (fairseq/examples/speech_recognition/infer.py)?

I did the following steps, but it failed:

  1. Create ‘.pt’ file from the finetuning checkpoint
    def save_model(my_checkpoint_path):
    model = Wav2Vec2ForCTC.from_pretrained(my_checkpoint_path)
    torch.save(model.state_dict(), my_model.pt)

  2. Decoding
    I used the decoding step command from the following webpage https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md#evaluating-a-ctc-model
    $subset=dev_other
    python examples/speech_recognition/infer.py /checkpoint/abaevski/data/speech/libri/10h/wav2vec/raw --task audio_pretraining
    –nbest 1 --path /path/to/model --gen-subset $subset --results-path /path/to/save/results/for/sclite --w2l-decoder kenlm
    –lm-model /path/to/kenlm.bin --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000
    –post-process letter
    I replaced /path/to/model with “my_model.pt”.

Then, I am getting the following error message.
Traceback (most recent call last):
File “/mount/fairseq/examples/speech_recognition/infer.py”, line 427, in
cli_main()
File “/mount/fairseq/examples/speech_recognition/infer.py”, line 423, in cli_main
main(args)
File “/mount/fairseq/examples/speech_recognition/infer.py”, line 229, in main
models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task(
File “/mount/fairseq/fairseq/checkpoint_utils.py”, line 370, in load_model_ensemble_and_task
state = load_checkpoint_to_cpu(filename, arg_overrides)
File “/mount/fairseq/fairseq/checkpoint_utils.py”, line 304, in load_checkpoint_to_cpu
state = _upgrade_state_dict(state)
File “/mount/fairseq/fairseq/checkpoint_utils.py”, line 456, in _upgrade_state_dict
{“criterion_name”: “CrossEntropyCriterion”, “best_loss”: state[“best_loss”]}
KeyError: ‘best_loss’

When I googled it, this seems relevant to removal of the optimization history logs:

This happens because we remove the useless optimization history logs from the model to reduce the file size. Only the desired model weights are kept to release. As a result, if you directly load the model, error will be reported that some logs are missed.

So how can I save the finetuning model compatible to “fairseq”. Should I store the optimization history? If yes, how can I do it? Does anyone have same experience? If yes, could you please share it with me? Thank you always.

Oh, I found the following previous discussion from the forum. Sorry for missing this one.

so i will check them out first. Thanks.

Hi @Su-Youn,

Did you manage to add LM decoding?

Yes. I used the following code with some updates.

1 Like

Thank you! I will give this a try :slight_smile:

This could also help: GitHub - patrickvonplaten/Wav2Vec2_PyCTCDecode: Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

Hi @patrickvonplaten , I have trained a model using wave2vec2 and tried to use n-gram with it. But there are errors like input shape is not matching. Also my model didn’t generate any alphabet.json file. I have tried your blog Boosting Wav2Vec2 with n-grams in 🤗 Transformers . But it didn’t worked on my model. Though it worked on some other models. Could you guide me what’s wrong with my model training? Here is my mode: nihalbaig/wav2vec2-large-xlsr-bn · Hugging Face .