thanks @muellerzr, I also get Used GPU memory: 529.86328125 MB
when i run
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"EleutherAI/gpt-neo-125m",
low_cpu_mem_usage=True,
)
model.to("cuda")
print(f"Used GPU memory: {torch.cuda.memory_allocated() / 1024 / 1024} MB")
But nvidia-smi -l
reports 980MiB
, so i guess what youre saving is that my GPU is “reserving” twice the model’s parameters in memory. Whats this for?
More importantly, which one, nvidia-smi -l
or torch.cuda.memory_allocated()
, is more indicative of when I am about to torch.cuda.OutOfMemoryError
? Because at the end of the day, im just trying to extrapolate what hardware I need for a given model architecture, sequence_length, batch size and optimizer.
thanks again!