Wav2Vec2Phoneme phoneme label length seem off

MagnusSonne · March 18, 2024, 11:19am

I’m using a wav2vec2 model (and have tried multiple different versions) to transcribe phonemes and output their onset and offset. The model produces decent phoneme labels, but the duration of the phonemes seem off. The phonemes mostly last 20 ms according to the model (a few last 40 ms), so most words consist of 20 ms of a label, 0-100 ms of nothing, 20 ms of another label, 0-100 ms of nothing, and so on. Is there a reason why the model outputs such short durations of the phonemes, and is there a way to ‘fix’ it?

Topic		Replies	Views
Length of windows on which Wav2Vec2 operates Beginners	0	158	March 16, 2024
[Question] Wav2vec2 word times 🤗Transformers	2	2938	June 24, 2021
Wav2Vec 2 audio processing Models	0	137	June 3, 2024
Wav2vec2 feature timestamps? (not words) Models	1	555	February 16, 2022
Wav2vec2 not converging when finetuning 🤗Transformers	7	2530	June 15, 2021

Wav2Vec2Phoneme phoneme label length seem off

Related topics