Loading of a model takes much RAM, passing to CUDA doesn't free RAM

ArtemS · August 8, 2021, 2:10pm

I’m trying to finetune 1.3B model. And so I search for the way to optimize RAM usage.

I noted that after load of a model it takes much RAM.

After model is loaded:
11.51 GB total memory used
0.0 GB used by torch objects on GPU
2 MiB total mem used on GPU

And when I move it to GPU it
a) takes only 5GB in VRAM (perhaps another 1.3GB is taken by Torch).
b) doesn’t free any RAM, even takes some 2.5GB more.

So the problem I see:
a) Model occupates much more space in RAM then in VRAM.
b) It doesn’t free RAM upon moving to CUDA.

You can check this behavior in this notebook: Google Colab

You will need “Large memory” instance, since while transferring to CUDA it even overshoots 13GB RAM limit. I use Torch 1.7.0+cu110 since instance has CUDA 11.2. But with the default 1.9.0+cu102 it is more or less the same.

Python garbage collector doesn’t help also.

Any thoughts on this?

Topic		Replies	Views
Can't load huge model onto multiple GPU's Beginners	5	5197	June 15, 2023
Loading model directly to GPU omitting RAM Beginners	6	65	March 28, 2025
If I have a small amount of VRAM compared to the model, will pytorch still use the CUDA accelerations? Beginners	0	455	April 21, 2023
General question about large model loading 🤗Accelerate	2	917	November 28, 2024
How to load large model with multiple GPU cards? Beginners	8	43595	October 25, 2023

Loading of a model takes much RAM, passing to CUDA doesn't free RAM

Related topics