Clear GPU memory of transformers.pipeline

simonduerr · May 24, 2022, 2:46pm

Whats the best way to clear the GPU memory on Huggingface spaces? I’m using transformers.pipeline for one of the models, the second is custom. I tried the following:

from transformers import pipeline
m = pipeline("text-generation", model="xx/xx")
res = m( ....    )
del m
torch.cuda.empty_cache()

What else can I do to free up memory after each call to one of the models?

simonduerr · May 25, 2022, 9:15am

from numba import cuda
device = cuda.get_current_device()
device.reset()

For the pipeline this seems to work. GPutil shows 91% utilization before and 0% utilization afterwards and the model can be rerun multiple times.

I have Runtime errors with this on Huggingface spaces though.

simonduerr · May 25, 2022, 11:39am

Another solution that is more elegant and automatically does the cleanup is using ray.remote. I wrapped the model inference using remote and it works out of the box

canthony · March 27, 2023, 4:32pm

This is a very interesting solution with does in fact clear up 100% of memory utilization. However, when I try to run or reconstruct my pipeline immediately after that I now get a “CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call” message which I cannot resolve. This may be the same runtime error you referred to.

simonduerr · March 27, 2023, 4:45pm

@canthony You probably need to wrap everything inside the ray.remote actor and set max_calls=1 to ensure that it is not going to be reused.

See example here app.py · simonduerr/ProteinMPNN at 21af4a534fd0c9f767228c87028f8fe50e7a6179

markba · January 24, 2025, 4:08pm

with torch.no_grad():
   res = m( ....    )

danfperam · March 19, 2025, 2:03pm

As I understand, you are loading your model on each ray.remote call correct? Why not passing the model object as argumnet to the ray.remote function?

Topic		Replies	Views
How to clear GPU memory with Trainer without commandline 🤗Transformers	1	2864	June 1, 2024
Is there a way to terminate llm.generate and release the GPU memory for next prompt? DeepSpeed	1	166	February 4, 2025
Continous increase in Memory usage 🤗Transformers	12	1394	December 1, 2024
How to free pipeline memory for new model in TPU Beginners	0	140	April 17, 2024
Free up GPU memory after training is finished or interrupted (on Colab) 🤗Transformers	1	2410	May 30, 2024

Clear GPU memory of transformers.pipeline

Related topics