Outofmemory error when running pipieline.to("cuda")

I am trying to reproducing the quickguide of diffuser, and got the following error messages when running pipeline.to(“cuda”) Which part in pipeline class I should modify to solve this issue?

OutOfMemoryError Traceback (most recent call last)
Cell In[4], line 1
----> 1 pipeline.to(“cuda”)

File C:\ProgramData\Miniconda3\envs\hg_diffuser\lib\site-packages\diffusers\pipelines\pipeline_utils.py:396, in DiffusionPipeline.to(self, torch_device, silence_dtype_warnings)
383 if (
384 module.dtype == torch.float16
385 and str(torch_device) in [“cpu”]
386 and not silence_dtype_warnings
387 and not is_offloaded
388 ):
389 logger.warning(
390 “Pipelines loaded with torch_dtype=torch.float16 cannot run with cpu device. It”
391 " is not recommended to move them to cpu as running them will fail. Please make"
(…)
394 " torch_dtype=torch.float16 argument, or use another device for inference."
395 )
→ 396 module.to(torch_device)
397 return self

File C:\ProgramData\Miniconda3\envs\hg_diffuser\lib\site-packages\torch\nn\modules\module.py:1145, in Module.to(self, *args, **kwargs)
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
→ 1145 return self._apply(convert)

File C:\ProgramData\Miniconda3\envs\hg_diffuser\lib\site-packages\torch\nn\modules\module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
→ 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(…)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

File C:\ProgramData\Miniconda3\envs\hg_diffuser\lib\site-packages\torch\nn\modules\module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
→ 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(…)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

[... skipping similar frames: Module._apply at line 797 (2 times)]

File C:\ProgramData\Miniconda3\envs\hg_diffuser\lib\site-packages\torch\nn\modules\module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
→ 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(…)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

File C:\ProgramData\Miniconda3\envs\hg_diffuser\lib\site-packages\torch\nn\modules\module.py:820, in Module._apply(self, fn)
816 # Tensors stored in modules are graph leaves, and we don’t want to
817 # track autograd history of param_applied, so we have to use
818 # with torch.no_grad():
819 with torch.no_grad():
→ 820 param_applied = fn(param)
821 should_use_set_data = compute_should_use_set_data(param, param_applied)
822 if should_use_set_data:

File C:\ProgramData\Miniconda3\envs\hg_diffuser\lib\site-packages\torch\nn\modules\module.py:1143, in Module.to..convert(t)
1140 if convert_to_format is not None and t.dim() in (4, 5):
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
→ 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

OutOfMemoryError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 4.00 GiB total capacity; 3.41 GiB already allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Hey @winecoding it looks like your gpu only has 4GB of memory. We support a few memory optimizations documented here Memory and speed I would recommend looking into accelerate offloading

1 Like