I followed a tutorial (Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers) to learn how to fine-tune a wav2vec2.0 model.
I managed to finetune this model with the common voice datasets.
Then, I tried to use my custom datasets to do it again but encountered a ‘CUDA out of memory’ error.
The duration of my audio files is around 10 minutes, I guess they are quite long and lack GPU memory.
I’d like to know if there are some methods to process such long audio files when fine-tuning the wav2vec2.0 model?
Any suggestions are appreciated.