GPU memory usage is twice (2x) what I calculated based on number of parameters and floating point precision

clam004 · May 18, 2024, 2:39pm

When i do

model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-neo-125m",
    low_cpu_mem_usage=True
)

I expect that since this is a fp32 model with 0.125 billion parameters, the amount of vRAM the model should occupy in GPU should be: 4 bytes per parameter x 0.125 billion parameters = 0.5 GB. Instead I see that 1000MiB are occupied when I do nvidia-smi, what am I missing?

muellerzr · May 18, 2024, 5:31pm

Can you share your full code? I’m seeing 529MB:

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-neo-125m",
    low_cpu_mem_usage=True,
)

model.to("cuda")

print(f"Used GPU memory: {torch.cuda.memory_allocated() / 1024 / 1024} MB")

Used GPU memory: 529.86328125 MB

muellerzr · May 18, 2024, 5:32pm

Also do note that your GPU will reserve some space in it when the driver warms up. It’s better to use torch.cuda.memory_allocated() here.

E.g. just reserving a tiny tensor on the GPU will use 152MiB in nvidia-smi:

import torch

t = torch.tensor([0.,1.]).cuda()

import time
time.sleep(10)

clam004 · May 18, 2024, 9:40pm

thanks @muellerzr, I also get Used GPU memory: 529.86328125 MB when i run

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-neo-125m",
    low_cpu_mem_usage=True,
)

model.to("cuda")

print(f"Used GPU memory: {torch.cuda.memory_allocated() / 1024 / 1024} MB")

But nvidia-smi -l reports 980MiB, so i guess what youre saving is that my GPU is “reserving” twice the model’s parameters in memory. Whats this for?

More importantly, which one, nvidia-smi -l or torch.cuda.memory_allocated(), is more indicative of when I am about to torch.cuda.OutOfMemoryError? Because at the end of the day, im just trying to extrapolate what hardware I need for a given model architecture, sequence_length, batch size and optimizer.

thanks again!

muellerzr · May 18, 2024, 9:44pm

Nope that’s not what I’m saying at all. There’s certain overhead CUDA needs when doing things under the hood with all their drivers. It is far from 2x otherwise it’d be impossible to train some models (And it’s all usable memory that’s available, just might not be “in use”)

This is more indicitive, however in general if you get cuda OOM that just means that again, you ran out of cuda memory. Looking at either or for hints won’t really per-se do much.

After you’ve gone through the initial parts (so like a step or two in) then you can eyeball the output on nvidia-smi (or GPU memory allocated % when looking at like W&B)

system · May 19, 2024, 9:44am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loading of a model takes much RAM, passing to CUDA doesn't free RAM 🤗Transformers	0	774	August 8, 2021
Memory overhead/usage calculation Intermediate	3	50	June 20, 2025
Missmatch between memory-estimate and Trainer-API Beginners	0	182	January 23, 2024
Loading model directly to GPU omitting RAM Beginners	6	80	March 28, 2025
GPU memory calculator 🤗Accelerate	2	1848	July 5, 2024

GPU memory usage is twice (2x) what I calculated based on number of parameters and floating point precision

Related topics