Whats the best way to clear the GPU memory on Huggingface spaces? I’m using
transformers.pipeline for one of the models, the second is custom. I tried the following:
from transformers import pipeline
m = pipeline("text-generation", model="xx/xx")
res = m( .... )
What else can I do to free up memory after each call to one of the models?
from numba import cuda
device = cuda.get_current_device()
For the pipeline this seems to work. GPutil shows 91% utilization before and 0% utilization afterwards and the model can be rerun multiple times.
I have Runtime errors with this on Huggingface spaces though.
Another solution that is more elegant and automatically does the cleanup is using
ray.remote. I wrapped the model inference using remote and it works out of the box