Making predictions in Boosting wav2vec2 with n-grams

I was trying to use the code from boosting wav2vec2 using N-grams language model. I couldn’t any method for making predictions in the blog so I used the one in Finetuning wav2vec2 for English blog. I am looking for an ASR models for Urdu language. The colab notebook is available here. Here is the Error I am facing:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-64-9236127eb9fb> in <module>
----> 1 predictions = [convertor(dataset, x) for x in range(l)]
      2 predictions[0]

3 frames
/usr/local/lib/python3.7/dist-packages/pyctcdecode/decoder.py in decode_beams(self, logits, beam_width, beam_prune_logp, token_min_logp, prune_history, hotwords, hotword_weight, lm_start_state)
    514             raise ValueError(
    515                 "Input logits of size %s, but vocabulary is size %s"
--> 516                 % (logits.shape[-1], len(self._idx2vocab))
    517             )
    518         # prepare hotword input

ValueError: Input logits of size 669, but vocabulary is size 52

The wav2vec2 model makes character predictions while the n-grams appears to be making word predictions. It would be very nice if someone could help. @patrickvonplaten

Thanks

@omar47 Did you find any solution? or are you still facing it?
I am also facing the same problem.