GPU memory usage is twice (2x) what I calculated based on number of parameters and floating point precision

When i do

model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-neo-125m",
    low_cpu_mem_usage=True
)

I expect that since this is a fp32 model with 0.125 billion parameters, the amount of vRAM the model should occupy in GPU should be: 4 bytes per parameter x 0.125 billion parameters = 0.5 GB. Instead I see that 1000MiB are occupied when I do nvidia-smi, what am I missing?

Can you share your full code? I’m seeing 529MB:

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-neo-125m",
    low_cpu_mem_usage=True,
)

model.to("cuda")

print(f"Used GPU memory: {torch.cuda.memory_allocated() / 1024 / 1024} MB")

Used GPU memory: 529.86328125 MB

Also do note that your GPU will reserve some space in it when the driver warms up. It’s better to use torch.cuda.memory_allocated() here.

E.g. just reserving a tiny tensor on the GPU will use 152MiB in nvidia-smi:

import torch

t = torch.tensor([0.,1.]).cuda()

import time
time.sleep(10)

thanks @muellerzr, I also get Used GPU memory: 529.86328125 MB when i run

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-neo-125m",
    low_cpu_mem_usage=True,
)

model.to("cuda")

print(f"Used GPU memory: {torch.cuda.memory_allocated() / 1024 / 1024} MB")

But nvidia-smi -l reports 980MiB, so i guess what youre saving is that my GPU is “reserving” twice the model’s parameters in memory. Whats this for?

More importantly, which one, nvidia-smi -l or torch.cuda.memory_allocated(), is more indicative of when I am about to torch.cuda.OutOfMemoryError? Because at the end of the day, im just trying to extrapolate what hardware I need for a given model architecture, sequence_length, batch size and optimizer.

thanks again!

Nope that’s not what I’m saying at all. There’s certain overhead CUDA needs when doing things under the hood with all their drivers. It is far from 2x otherwise it’d be impossible to train some models :slight_smile: (And it’s all usable memory that’s available, just might not be “in use”)

This is more indicitive, however in general if you get cuda OOM that just means that again, you ran out of cuda memory. Looking at either or for hints won’t really per-se do much.

After you’ve gone through the initial parts (so like a step or two in) then you can eyeball the output on nvidia-smi (or GPU memory allocated % when looking at like W&B)

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.