Clear Cache with Accelerate

Hello folks!

I am trying to clear the cache for multi gpu training. I am using both torch.cuda.empty_cache() and accelerator.free_memory(), however the gpu memory is getting saturated. torch.cuda.empty_cache() worked for the same code on single gpu when I wasn’t using accelerate (after deleting the unused variable and using gc.collect()).

Can someone suggest how to clear the gpu memory for all gpus when doing multi-gpu training on accelerate?

Hello @sheldon-spock, can you provide a minimal reproducible example code for the issue?

Hi @smangrul sure,

loss_1, loss_2, loss_3 = stack(batch_input, batch_labels) #"stack" refers to 2 models applied in series
loss = loss_1 + loss_2 + loss_3
accelerator.backward(loss)
optimizer.step()
optimizer.zero_grad()
del loss_1, loss_2, loss_3
gc.collect()
torch.cuda.empty_cache()
accelerator.free_memory()

When I was doing this on a single gpu without accelerator, the gpu utilization went down significantly after every training step ending with torch.cuda.empty_cache() (I checked this by printing gpu utilization when calling the models). However, I am getting almost no reduction in memory on the multiple gpus on accelerate.

Thanks!