Hello folks!
I am trying to clear the cache for multi gpu training. I am using both torch.cuda.empty_cache()
and accelerator.free_memory()
, however the gpu memory is getting saturated. torch.cuda.empty_cache()
worked for the same code on single gpu when I wasn’t using accelerate (after deleting the unused variable and using gc.collect()
).
Can someone suggest how to clear the gpu memory for all gpus when doing multi-gpu training on accelerate?
Hello @sheldon-spock, can you provide a minimal reproducible example code for the issue?
Hi @smangrul sure,
loss_1, loss_2, loss_3 = stack(batch_input, batch_labels) #"stack" refers to 2 models applied in series
loss = loss_1 + loss_2 + loss_3
accelerator.backward(loss)
optimizer.step()
optimizer.zero_grad()
del loss_1, loss_2, loss_3
gc.collect()
torch.cuda.empty_cache()
accelerator.free_memory()
When I was doing this on a single gpu without accelerator, the gpu utilization went down significantly after every training step ending with torch.cuda.empty_cache()
(I checked this by printing gpu utilization when calling the models). However, I am getting almost no reduction in memory on the multiple gpus on accelerate.
Thanks!