Also do note that your GPU will reserve some space in it when the driver warms up. It’s better to use torch.cuda.memory_allocated()
here.
E.g. just reserving a tiny tensor on the GPU will use 152MiB in nvidia-smi:
import torch
t = torch.tensor([0.,1.]).cuda()
import time
time.sleep(10)