Hello,
I am trying to run Wav2Vec2 on a Raspberry Pi.
I did it with the model ‘facebook/wav2vec2-base-960h’, but the computational time of the logits by the model is very long (12 seconds for a 2.6 seconds audio).
So I looked for a smaller model, and I found “wav2vec2_tiny_random_robust” provided by Patrick von Platen on the hub (patrickvonplaten/wav2vec2_tiny_random_robust · Hugging Face), trained on librispeech_asr.
However when I use this model and the associated tokenizer, it does not work:
[‘ fagfeaa ggbe gea b abf’] d d babe
instead of [‘LY EVIDENT THAT THE TIME FOR AN INQUIRY WILL COME’] with the ‘base’ model.
On the web page of the “tiny” model, the calculus of the cost function is provided, but not the way to decode the letter sequence.
Is this a tokenizer problem or do I need to fine-tune the model?