Can you share your full code? I’m seeing 529MB:
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"EleutherAI/gpt-neo-125m",
low_cpu_mem_usage=True,
)
model.to("cuda")
print(f"Used GPU memory: {torch.cuda.memory_allocated() / 1024 / 1024} MB")
Used GPU memory: 529.86328125 MB