I am currently using accelerate
for multi-GPU training. Running python train.py
on a single GPU works fine. However, when I execute the following command, the RAM usage keeps increasing until it eventually fails:
CUDA_VISIBLE_DEVICES=2,3 accelerate launch --num_processes 2 train.py
Execution Environment:
accelerate : 0.28.0
python : 3.8.10
cuda : nvcc -V : 12.1 nvidia-smi 12.0
pytorch : 2.1.0+cu121
and I use in docker.