model = AutoModelForCausalLM.from_pretrained(
"EleutherAI/gpt-neo-125m",
low_cpu_mem_usage=True
)
I expect that since this is a fp32 model with 0.125 billion parameters, the amount of vRAM the model should occupy in GPU should be: 4 bytes per parameter x 0.125 billion parameters = 0.5 GB. Instead I see that 1000MiB are occupied when I do nvidia-smi, what am I missing?
But nvidia-smi -l reports 980MiB, so i guess what youre saving is that my GPU is “reserving” twice the model’s parameters in memory. Whats this for?
More importantly, which one, nvidia-smi -l or torch.cuda.memory_allocated(), is more indicative of when I am about to torch.cuda.OutOfMemoryError? Because at the end of the day, im just trying to extrapolate what hardware I need for a given model architecture, sequence_length, batch size and optimizer.
Nope that’s not what I’m saying at all. There’s certain overhead CUDA needs when doing things under the hood with all their drivers. It is far from 2x otherwise it’d be impossible to train some models (And it’s all usable memory that’s available, just might not be “in use”)
This is more indicitive, however in general if you get cuda OOM that just means that again, you ran out of cuda memory. Looking at either or for hints won’t really per-se do much.
After you’ve gone through the initial parts (so like a step or two in) then you can eyeball the output on nvidia-smi (or GPU memory allocated % when looking at like W&B)