I am trying to clear the cache for multi gpu training. I am using both
accelerator.free_memory(), however the gpu memory is getting saturated.
torch.cuda.empty_cache() worked for the same code on single gpu when I wasn’t using accelerate (after deleting the unused variable and using
Can someone suggest how to clear the gpu memory for all gpus when doing multi-gpu training on accelerate?
Hello @sheldon-spock, can you provide a minimal reproducible example code for the issue?
Hi @smangrul sure,
loss_1, loss_2, loss_3 = stack(batch_input, batch_labels) #"stack" refers to 2 models applied in series
loss = loss_1 + loss_2 + loss_3
del loss_1, loss_2, loss_3
When I was doing this on a single gpu without accelerator, the gpu utilization went down significantly after every training step ending with
torch.cuda.empty_cache() (I checked this by printing gpu utilization when calling the models). However, I am getting almost no reduction in memory on the multiple gpus on accelerate.