Cuda out of memory in SD3

aniket2025 · May 16, 2025, 6:07am

I am working on a cloth swapping project, in which I have used GroundedSAM (to fetch cloth masks) and then use Stable Diffusion 3 pipeline with Controlnet 3 (supported for SD3).

The problem is while I am creating the pipeline object (pipe) for the SD3 (pre-trained) and map the pipe to my available device (“cuda”) in my case, it is throwing the error Cuda out of memory.

I have tried to ommitt the cuda map and run the next parts but it is showing that different outputs in different devices. Must be on same device. I tried the model in Google Colab (Pay as you go: 16 GB VRAM)and also in RunPod 24GB VRAM. In both the cases, the same error is generating for me. I am attaching a small reproducable code for this.

Note: RoboFlow GroundedSAM is also running in the same environment

I am also attaching a picture of my error: (This error originates after the SD3_pipeline).

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 23.64 GiB of which 2.81 MiB is free. Process 2832761 has 23.63 GiB memory in use. Of the allocated memory 22.72 GiB is allocated by PyTorch, and 456.70 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output is truncated.

I am using the RoboFlow GroundedSAM (only can run in Google Colab). At this situation suggests some solutions. Should I go for more powerful VRAMs like 32 GB in Runpod, or there is any other issue?

I have tried the methods like cache removal and setting cuda storage but they didn’t work.

John6666 · May 16, 2025, 6:28am

It seems that there is a mysterious bug specific to SD3.5 medium.

aniket2025 · May 16, 2025, 9:56am

Yes. But for my case, I have GroundedSAM, Stable Diffusion 3 inpainting with ControlNet 3.

John6666 · May 16, 2025, 10:12am

The pipeline is mostly the same, so I think problems that occur in one place are likely to occur elsewhere as well.

That said, SD 3.5 Medium is larger than I expected. It seems that the text encoder (T5) is large. With this setup, it might be difficult to achieve the desired VRAM capacity without quantization.

aniket2025 · May 16, 2025, 10:49am

I am already using the torch_dtype argument value as torch.float16, I also design the pipeline for one image at a time. Also reduces the image height and width to 640 instead of optimal 1024.

Despite these customizations, the model is not working. Plus I have GroundedSAM in the same notebook which solely requires a minimum 12-14 GB VRAM itself. It is not possible to use more than 24 GB VRAM, as it is not feasible for the clients.

However, can you give me an approximate Amount of GPU required for the complete process? (GroundedSAM + SD3 with ControlNet 3). Or can you tell me some other ways of inpainting?

Topic		Replies	Views
OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB. GPU Spaces	6	1704	July 7, 2024
CUDA out of memory on multi-GPU 🤗Transformers	1	2649	March 6, 2024
Cuda Out of Memory when fine tuning llm model 🤗Transformers	3	1165	May 7, 2024
Stable diffusion automatic1111 dreambooth Beginners	1	1858	December 22, 2022
"meta-llama/Llama-3.2-90B-Vision-Instruct" continually crashing with "torch.OutOfMemoryError: CUDA out of memory. Tried to allocate" Beginners	1	370	November 22, 2024

Cuda out of memory in SD3

Related topics