In wav2vec2 why are the basic learned units are learning basic units are 25ms long?

laro1 · February 21, 2023, 8:49am

According to: Wav2vec 2.0: Learning the structure of speech from raw audio

Wav2vec 2.0 tackles this issue by learning basic units that are 25ms long to enable learning of high-level contextualized representations.

and

The model first processes the raw waveform of the speech audio with a multilayer convolutional neural network to get latent audio representations of 25ms each.

Why they used 25ms and not 20ms or 30ms ?
To be sure I understand correctly, if the length of the input wav file to the wav2vec2 model was 3.4 seconds, the model (the conv layers) will split it to 136 pieces ? (3.4 * 1000 /25) ?

Topic		Replies	Views
Wav2Vec 2 audio processing Models	0	141	June 3, 2024
Length of windows on which Wav2Vec2 operates Beginners	0	163	March 16, 2024
Wav2vec maximum input audio length Models	0	153	April 20, 2024
Wav2Vec2Phoneme phoneme label length seem off Beginners	0	125	March 18, 2024
Wav2vec2 feature timestamps? (not words) Models	1	561	February 16, 2022

In wav2vec2 why are the basic learned units are learning basic units are 25ms long?

Related topics