GPU memory usage is twice (2x) what I calculated based on number of parameters and floating point precision

clam004 · May 18, 2024, 9:40pm

thanks @muellerzr, I also get Used GPU memory: 529.86328125 MB when i run

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-neo-125m",
    low_cpu_mem_usage=True,
)

model.to("cuda")

print(f"Used GPU memory: {torch.cuda.memory_allocated() / 1024 / 1024} MB")

But nvidia-smi -l reports 980MiB, so i guess what youre saving is that my GPU is “reserving” twice the model’s parameters in memory. Whats this for?

More importantly, which one, nvidia-smi -l or torch.cuda.memory_allocated(), is more indicative of when I am about to torch.cuda.OutOfMemoryError? Because at the end of the day, im just trying to extrapolate what hardware I need for a given model architecture, sequence_length, batch size and optimizer.

thanks again!

Topic		Replies	Views
Loading of a model takes much RAM, passing to CUDA doesn't free RAM 🤗Transformers	0	774	August 8, 2021
Memory overhead/usage calculation Intermediate	3	53	June 20, 2025
Missmatch between memory-estimate and Trainer-API Beginners	0	183	January 23, 2024
Loading model directly to GPU omitting RAM Beginners	6	84	March 28, 2025
GPU memory calculator 🤗Accelerate	2	1870	July 5, 2024

GPU memory usage is twice (2x) what I calculated based on number of parameters and floating point precision

Related topics