Decoding the logits provided by a tiny Wav2vec2 model gives sequences that do not make sense

LauBen · October 25, 2022, 10:43am

Hello,
I am trying to run Wav2Vec2 on a Raspberry Pi.
I did it with the model ‘facebook/wav2vec2-base-960h’, but the computational time of the logits by the model is very long (12 seconds for a 2.6 seconds audio).
So I looked for a smaller model, and I found “wav2vec2_tiny_random_robust” provided by Patrick von Platen on the hub (patrickvonplaten/wav2vec2_tiny_random_robust · Hugging Face), trained on librispeech_asr.
However when I use this model and the associated tokenizer, it does not work:
[‘ ~~fagfe d d babe~~aa ggbe gea b abf’]

instead of [‘LY EVIDENT THAT THE TIME FOR AN INQUIRY WILL COME’] with the ‘base’ model.
On the web page of the “tiny” model, the calculus of the cost function is provided, but not the way to decode the letter sequence.
Is this a tokenizer problem or do I need to fine-tune the model?

Topic		Replies	Views
Wav2Vec2ForCTC abandons one logit sometimes Models	1	429	October 26, 2022
Pretrained wav2vec2 speech to text - decoded text is gibberish Models	0	402	June 12, 2023
Wav2vec2 results vary depending on far away prefix len Models	0	186	September 30, 2023
Wav2vec2 finetuning and language model Beginners	0	213	October 1, 2023
Making predictions in Boosting wav2vec2 with n-grams Models	2	414	October 25, 2022

Decoding the logits provided by a tiny Wav2vec2 model gives sequences that do not make sense

Related topics