How do I release memory after using AutoModel.from_pretrained() to load model

anon59317333 · September 22, 2024, 8:11am

after execute following code on apple’s silicon chip , even though the program ends memory is still occupied too much.
sorry i’m a beginner of both English and Deep learning. Is it a bug or just I used a wrong way?

John6666 · September 22, 2024, 8:36am

I generally do it like this, but it’s probably not right and there are smarter ways to do it. I’m interested too.

import torch
import gc
del llm_model
torch.cuda.empty_cache()
gc.collect()

anon59317333 · September 22, 2024, 8:49am

thank you, actully del llm_model works, the problem is the memory monitor didn’t refresh

John6666 · September 22, 2024, 9:42am

Hmmm, like this?
In your case anyway, it would be better to remove it from VRAM first and then delete it.

But well, the torch tensor used in the model’s content is quite stubborn…
If you don’t delete the tensors in the model, not the pipes or the model, one by one, they might be referenced from somewhere and won’t disappear.

import torch
import gc
llm_model.to("cpu")
del llm_model
torch.cuda.empty_cache()
gc.collect()

It is more reliable to del together any relevant models and pipes.

anon59317333 · September 24, 2024, 11:19am

thank you! it’s helpful

system · September 24, 2024, 11:20pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why does moving ML model initialization into a function prevent GPU OOM errors when del, gc.collect(), and torch.cuda.empty_cache() fail? Beginners	0	96	December 5, 2024
Unable to free whole GPU memory even after ``del var; gc.collect; empty_cache()`` 🤗Transformers	8	515	September 26, 2024
Unfreed GPU memory after inference using AutoTokenizer Beginners	1	719	March 29, 2024
Loading model directly to GPU omitting RAM Beginners	6	65	March 28, 2025
Is there a way to terminate llm.generate and release the GPU memory for next prompt? DeepSpeed	1	138	February 4, 2025

How do I release memory after using AutoModel.from_pretrained() to load model

Related topics