Whats the best way to clear the GPU memory on Huggingface spaces? I’m using transformers.pipeline for one of the models, the second is custom. I tried the following:
from transformers import pipeline
m = pipeline("text-generation", model="xx/xx")
res = m( .... )
del m
torch.cuda.empty_cache()
What else can I do to free up memory after each call to one of the models?
Another solution that is more elegant and automatically does the cleanup is using ray.remote. I wrapped the model inference using remote and it works out of the box
This is a very interesting solution with does in fact clear up 100% of memory utilization. However, when I try to run or reconstruct my pipeline immediately after that I now get a “CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call” message which I cannot resolve. This may be the same runtime error you referred to.