Clear GPU memory of transformers.pipeline

Whats the best way to clear the GPU memory on Huggingface spaces? I’m using transformers.pipeline for one of the models, the second is custom. I tried the following:

from transformers import pipeline
m = pipeline("text-generation", model="xx/xx")
res = m( ....    )
del m
torch.cuda.empty_cache()

What else can I do to free up memory after each call to one of the models?

1 Like
from numba import cuda
device = cuda.get_current_device()
device.reset()

For the pipeline this seems to work. GPutil shows 91% utilization before and 0% utilization afterwards and the model can be rerun multiple times.

I have Runtime errors with this on Huggingface spaces though.

2 Likes

Another solution that is more elegant and automatically does the cleanup is using ray.remote. I wrapped the model inference using remote and it works out of the box :slight_smile:

This is a very interesting solution with does in fact clear up 100% of memory utilization. However, when I try to run or reconstruct my pipeline immediately after that I now get a “CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call” message which I cannot resolve. This may be the same runtime error you referred to.

1 Like

@canthony You probably need to wrap everything inside the ray.remote actor and set max_calls=1 to ensure that it is not going to be reused.

See example here app.py · simonduerr/ProteinMPNN at 21af4a534fd0c9f767228c87028f8fe50e7a6179

1 Like