Cuda memory imbalanced in multiple GPUs, and multiple process in device 0

which47 · May 18, 2024, 7:57am

Hello, I am using v100 * 8 to finetune the deberta-v3-large model for sequence classification, I use transformers and Trainer API to train after using accelerate config to specify a deepspeed config, I use accelerate launch main.py to start training, but the process was not initialized properly, and the cuda memory is not balanced at all.

Here is my config file

It is pretty weird, I just use a huggingface trainer API to code

Topic		Replies	Views
Tranier not starting on multi-GPU setting 🤗Transformers	1	1029	February 15, 2024
TPU memory issues 🤗Accelerate	0	1589	May 30, 2021
Multi GPU Audio Finetuning for Wav2vec2 Failing for 4 GPUs but successful for 1 GPU Beginners	0	307	July 9, 2023
Fine-tune OPT 13B: CUDA out of memory error (720gb vram, batch size 1, fp16)! Beginners	6	4561	July 25, 2022
Having trouble accelerate on my 2 GPU machine Beginners	0	734	May 24, 2023

Cuda memory imbalanced in multiple GPUs, and multiple process in device 0

Related topics