I was trying to use the code from boosting wav2vec2 using N-grams language model. I couldn’t any method for making predictions in the blog so I used the one in Finetuning wav2vec2 for English blog. I am looking for an ASR models for Urdu language. The colab notebook is available here. Here is the Error I am facing:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-64-9236127eb9fb> in <module>
----> 1 predictions = [convertor(dataset, x) for x in range(l)]
2 predictions[0]
3 frames
/usr/local/lib/python3.7/dist-packages/pyctcdecode/decoder.py in decode_beams(self, logits, beam_width, beam_prune_logp, token_min_logp, prune_history, hotwords, hotword_weight, lm_start_state)
514 raise ValueError(
515 "Input logits of size %s, but vocabulary is size %s"
--> 516 % (logits.shape[-1], len(self._idx2vocab))
517 )
518 # prepare hotword input
ValueError: Input logits of size 669, but vocabulary is size 52
The wav2vec2 model makes character predictions while the n-grams appears to be making word predictions. It would be very nice if someone could help. @patrickvonplaten
Thanks