How to finetune wav2vec2.0-xlsr model with long audio files

xxxing · June 16, 2022, 8:10pm

Hi everyone,

I followed a tutorial (Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers) to learn how to fine-tune a wav2vec2.0 model.
I managed to finetune this model with the common voice datasets.
Then, I tried to use my custom datasets to do it again but encountered a ‘CUDA out of memory’ error.
The duration of my audio files is around 10 minutes, I guess they are quite long and lack GPU memory.
I’d like to know if there are some methods to process such long audio files when fine-tuning the wav2vec2.0 model?
Any suggestions are appreciated.

jhonparra18 · September 6, 2022, 2:48am

If not using many GPUs in parallel, you might have to find a way to trim your audio files to somewhere in between 15-30 seconds, that worked for me ( also, make sure you’re using fp16).

Topic		Replies	Views
Wav2vec2.0 memory issue Models	13	11518	December 25, 2024
How much memory to fine tune wav2vec2? Models	2	1152	March 7, 2022
Wav2Vec2 Fine Tuning Models	0	258	December 21, 2023
How much fire power are we expected to have in order to fine tune the W2V2 XLSR model? 🤗Transformers	4	879	March 27, 2021
Very Slow Fine Tuning Performance for Speech? 🤗Transformers	3	661	August 14, 2023

How to finetune wav2vec2.0-xlsr model with long audio files

Related topics