Clear GPU memory of transformers.pipeline

Whats the best way to clear the GPU memory on Huggingface spaces? I’m using transformers.pipeline for one of the models, the second is custom. I tried the following:

from transformers import pipeline
m = pipeline("text-generation", model="xx/xx")
res = m( ....    )
del m
torch.cuda.empty_cache()

What else can I do to free up memory after each call to one of the models?

from numba import cuda
device = cuda.get_current_device()
device.reset()

For the pipeline this seems to work. GPutil shows 91% utilization before and 0% utilization afterwards and the model can be rerun multiple times.

I have Runtime errors with this on Huggingface spaces though.

Another solution that is more elegant and automatically does the cleanup is using ray.remote. I wrapped the model inference using remote and it works out of the box :slight_smile: