I am trying to clear the cache for multi gpu training. I am using both torch.cuda.empty_cache() and accelerator.free_memory(), however the gpu memory is getting saturated. torch.cuda.empty_cache() worked for the same code on single gpu when I wasn’t using accelerate (after deleting the unused variable and using gc.collect()).
Can someone suggest how to clear the gpu memory for all gpus when doing multi-gpu training on accelerate?
loss_1, loss_2, loss_3 = stack(batch_input, batch_labels) #"stack" refers to 2 models applied in series
loss = loss_1 + loss_2 + loss_3
accelerator.backward(loss)
optimizer.step()
optimizer.zero_grad()
del loss_1, loss_2, loss_3
gc.collect()
torch.cuda.empty_cache()
accelerator.free_memory()
When I was doing this on a single gpu without accelerator, the gpu utilization went down significantly after every training step ending with torch.cuda.empty_cache() (I checked this by printing gpu utilization when calling the models). However, I am getting almost no reduction in memory on the multiple gpus on accelerate.
hi @sheldon-spock and @smangrul , I am using the accelerate for multiple GPUs but I get the cuda memeory error. I am using 4 GPus and I am surprised why I get cuda memory error. do you have any idea how to solve this issue?