Multi GPU Audio Finetuning for Wav2vec2 Failing for 4 GPUs but successful for 1 GPU

gmenon · July 9, 2023, 11:37am

Hi All,

Thanks in advance for helping me in advance.

Context: I am having an issue with training using Trainer in Multi GPU setup. I am finetuning Wav2vec2.0 on a dataset that has >100,000 audio segments and hence, I am unable to run it in a single GPU.

What I am doing?
To build the script to try in both a single GPU infrastructure versus multiple GPU infrastructure, I am picking up only 1000 short audio segments (<6 seconds) and transcriptions for finetuning.

What’s happening?
When i train it on a single GPU, it is training successfully and i have no problems for 1000 audio segments. However, when i try to do the same thing with 4 GPUs, it crashes with no memory error (help ! RuntimeError: CUDA out of memory. Tried to allocate XXXXXXXX)

Help Required?
How can i make this work? As per the article here (From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease), the Trainer Module takes care of distributed computing by itself. There is clearly something that is wrong that I am unable to debug. Help here would be appreciated.

Script similar to this notebook : Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers
Hardware: Tesla V100-SXM2-32GB

Topic		Replies	Views
Wav2vec fine-tuning with multiGPU Models	16	6937	May 22, 2021
Wav2vec2.0 memory issue Models	13	11507	December 25, 2024
How much memory to fine tune wav2vec2? Models	2	1149	March 7, 2022
Wav2Vec2.0 FineTuning distributed training 🤗Transformers	0	348	June 30, 2022
Wav2Vec2 Fine Tuning Models	0	258	December 21, 2023

Multi GPU Audio Finetuning for Wav2vec2 Failing for 4 GPUs but successful for 1 GPU

Related topics