RuntimeError: CUDA out of memory (fix related to pytorch?)

Hi,

Apologies. I searched and this error has been covered before but the topics look more advanced than what I’m able to understand at this point.

I’m trying to run the following command after successfully going through the install procedure made by AssemblyAI (see here). I’m just running a basic command with a prompt to Synthesise an image using Stable Diffusion:

python scripts/txt2img.py --prompt "goldfish wearing a hat" --plms --ckpt sd-v1-4.ckpt --skip_grid --n_samples 1

It’s generating the following error

Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
Sampling:   0%|                                       | 0/2 [00:00<?, ?it/sData shape for PLMS sampling is (1, 4, 64, 64)         | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler:   0%|                                  | 0/50 [00:00<?, ?it/s]
data:   0%|                                           | 0/1 [00:00<?, ?it/s]
Sampling:   0%|                                       | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "scripts/txt2img.py", line 344, in <module>
    main()
  File "scripts/txt2img.py", line 295, in main
    samples_ddim, _ = sampler.sample(S=opt.ddim_steps,
  File "/home/sunil/.conda/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/sunil/Developer/stable-diffusion/ldm/models/diffusion/plms.py", line 97, in sample
    samples, intermediates = self.plms_sampling(conditioning, size,
  File "/home/sunil/.conda/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/sunil/Developer/stable-diffusion/ldm/models/diffusion/plms.py", line 152, in plms_sampling
    outs = self.p_sample_plms(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
  File "/home/sunil/.conda/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/sunil/Developer/stable-diffusion/ldm/models/diffusion/plms.py", line 218, in p_sample_plms
    e_t = get_model_output(x, t)
  File "/home/sunil/Developer/stable-diffusion/ldm/models/diffusion/plms.py", line 185, in get_model_output
    e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
  File "/home/sunil/Developer/stable-diffusion/ldm/models/diffusion/ddpm.py", line 987, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/home/sunil/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sunil/Developer/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1410, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/home/sunil/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sunil/Developer/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 732, in forward
    h = module(h, emb, context)
  File "/home/sunil/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sunil/Developer/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 85, in forward
    x = layer(x, context)
  File "/home/sunil/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sunil/Developer/stable-diffusion/ldm/modules/attention.py", line 258, in forward
    x = block(x, context=context)
  File "/home/sunil/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sunil/Developer/stable-diffusion/ldm/modules/attention.py", line 209, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "/home/sunil/Developer/stable-diffusion/ldm/modules/diffusionmodules/util.py", line 114, in checkpoint
    return CheckpointFunction.apply(func, len(inputs), *args)
  File "/home/sunil/Developer/stable-diffusion/ldm/modules/diffusionmodules/util.py", line 127, in forward
    output_tensors = ctx.run_function(*ctx.input_tensors)
  File "/home/sunil/Developer/stable-diffusion/ldm/modules/attention.py", line 212, in _forward
    x = self.attn1(self.norm1(x)) + x
  File "/home/sunil/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sunil/Developer/stable-diffusion/ldm/modules/attention.py", line 180, in forward
    sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
  File "/home/sunil/.conda/envs/ldm/lib/python3.8/site-packages/torch/functional.py", line 330, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 7.93 GiB total capacity; 4.04 GiB already allocated; 470.94 MiB free; 4.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

It kinda tells me what to do at the bottom - that if my reserved memory is greater than allocated memory then “try setting max_split_size_mb to avoid fragmentation”. I’m guessing this is something to do with PyTorch? How do I do this?

Thanks
Sunny

so I installed nvitop and can confirm that on average with browser open and couple other apps, my gpu (1070gtx) is using 3-4gb of it’s VRAM. I tried rebooting, not starting any apps up and then running txt2img.py but even though my memory allocation was XXX on reboot it still fails with “out of memory” error

Hi,

RuntimeError: CUDA out of memory is a common error message when you try to allocate more memory than available in your GPU.

Your GPU seems to have 8 GB, however it seems Stable Diffusion needs at least 10 GB (please, correct me if I’m wrong). You could try booting your machine through CLI to release memory, or experimenting with the script parameters. Not sure if there are alternatives.

Hi Lucas, thanks for the reply.

I read somewhere that the Min vram requirement was 4gb, but the official docs say otherwise! I’ll have a play with Linux and see if I can get the default vram usage down, thanks for the suggestion. Maybe it’s to to upgrade that gpu!

Cheers

1 Like

Alas, I was told a minimum of 6 GB was required, but it does seem 10 GB is more accurate. Is there a trick to running this on lower GPU VRAM?

I’ve seen some tutorials on youtube to get it working on lower VRAM, 4GB i think may now be possible. Not tested it though

Links to the tutorials would be appreciated, at least as a jumping off point. Thanks!