Wav2Vec 2 audio processing

alejopaullier · June 3, 2024, 7:28pm

I am currently using facebook/wav2vec2-base model for an audio classification task. My code is based on the official HF Audio Classification tutorial.

In this tutorial, audios are sampled using a sample_rate of 16.000. This means that 1 second of audio results in an array of length 16.000. Also, Wav2Vec2 maximum input length is 150.000 as far as I know.

Does this mean that, without any chunking, the model can only process audios just up to ~10 seconds?

If you have longer audios, which i guess is usually the case, what strategies can be applied to mitigate this issue?

Topic		Replies	Views
Wav2vec maximum input audio length Models	0	153	April 20, 2024
File size/speech length limit for Wave2Vec2? Beginners	4	2374	June 24, 2023
Wav2vec2 for long audiofiles Beginners	2	4132	March 18, 2022
Why Pipeline inferencing with CPU and pytorch for wav2vec only use 50% of cpu? and does chunk length impact the speed for model? 🤗Transformers	1	625	July 26, 2024
Wav2Vec2ForCTC abandons one logit sometimes Models	1	429	October 26, 2022

Wav2Vec 2 audio processing

Related topics