Program not working on GPU but works on CPU

deicool · May 14, 2025, 10:51am

Hello

I am trying to run this program on GPU, it generates a black image. On CPU it gives the perfect output.

The program is as follows:

import logging
from diffusers import AutoPipelineForText2Image, AutoencoderKL
import torch
import numpy as np
import random
import os
from PIL import Image

# =========================
# STEP 0: Logging Setup
# =========================
log_file = "generation_log.txt"
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(log_file),
        logging.StreamHandler()
    ]
)

logging.info("Initializing...")

# =========================
# STEP 1: Environment Setup
# =========================

torch.cuda.empty_cache()
torch.cuda.ipc_collect()

seed = random.randint(0, 9999999)
torch.manual_seed(seed)
np.random.seed(seed)
logging.info(f"Using seed: {seed}")

# ===============================
# STEP 2: Model and LoRA Setup
# ===============================
logging.info("Loading base model and LoRA weights...")

model_dir = "D:\\Ganu\\AIImage\\huggingface\\kohya_ss\\kohya_ss\\outputs"
lora_weights_path = os.path.join(model_dir, "model")
model_id = "stabilityai/stable-diffusion-xl-base-1.0"

# Optional: Custom VAE (uncomment if needed)
# vae = AutoencoderKL.from_pretrained(
#     "madebyollin/sdxl-vae-fp16-fix",
#     torch_dtype=torch.float16
# ).to("cuda")

try:
    pipeline = AutoPipelineForText2Image.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        variant="fp16"
    ).to("cuda")
    logging.info("Pipeline loaded to GPU with float16.")
except Exception as e:
    logging.error(f"Failed to load model pipeline: {e}")
    raise

# If using VAE:
# pipeline.vae = vae

pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()

try:
    pipeline.load_lora_weights(lora_weights_path, weight_name="last.safetensors")
    logging.info("LoRA weights loaded successfully.")
except ValueError as e:
    logging.error("Invalid LoRA checkpoint. Check the format or compatibility.")
    raise e

# =========================
# STEP 3: Prompt Inference
# =========================
text_prompt = (
    "A wide, breathtaking landscape with all real vibrant nature-themed background, lush forests, mountains, and a Doctor standing prominently in the foreground"
)

negative_prompt = (
    "text, letters, words, signage, logos, labels, writing, messy background, busy layout, clutter, double faces, abstract shapes, UI panels with words, overlapping elements, header, footer, top bar, navigation bar, bottom menu, toolbar, top text, website layout, browser frame, button row, page border, UI bar"
)

logging.info(f"Running inference with prompt: {text_prompt}")

try:
    result = pipeline(
        prompt=text_prompt,
        negative_prompt=negative_prompt,
        guidance_scale=7.5,
        num_inference_steps=30
    )
    generated_image = result.images[0]
    output_path = f"generated_image_{seed}.png"
    generated_image.save(output_path)
    logging.info(f"Image saved to: {output_path}")
    generated_image.show()
except Exception as e:
    logging.error(f"Error during image generation: {e}")
    raise

The environment details are as follows

C:\Users\ADMIN>nvidia-smi
Wed May 14 15:17:51 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 566.36                 Driver Version: 566.36         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650      WDDM  |   00000000:01:00.0  On |                  N/A |
| N/A   62C    P0             32W /   50W |    3833MiB /   4096MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      3008    C+G   ...n\NVIDIA app\CEF\NVIDIA Overlay.exe      N/A      |
|    0   N/A  N/A      6232    C+G   ...b3d8bbwe\Microsoft.Media.Player.exe      N/A      |
|    0   N/A  N/A     10308    C+G   ...oogle\Chrome\Application\chrome.exe      N/A      |
|    0   N/A  N/A     15020    C+G   ...n\NVIDIA app\CEF\NVIDIA Overlay.exe      N/A      |
|    0   N/A  N/A     16140    C+G   C:\Windows\explorer.exe                     N/A      |
|    0   N/A  N/A     17036    C+G   ...siveControlPanel\SystemSettings.exe      N/A      |
|    0   N/A  N/A     17088    C+G   ...oogle\Chrome\Application\chrome.exe      N/A      |
|    0   N/A  N/A     17732    C+G   ...CBS_cw5n1h2txyewy\TextInputHost.exe      N/A      |
|    0   N/A  N/A     19012    C+G   ...on\135.0.3179.98\msedgewebview2.exe      N/A      |
|    0   N/A  N/A     19720    C+G   ...t.LockApp_cw5n1h2txyewy\LockApp.exe      N/A      |
|    0   N/A  N/A     20816    C+G   ...2txyewy\StartMenuExperienceHost.exe      N/A      |
|    0   N/A  N/A     20948    C+G   ....Search_cw5n1h2txyewy\SearchApp.exe      N/A      |
|    0   N/A  N/A     21008    C+G   ....Search_cw5n1h2txyewy\SearchApp.exe      N/A      |
|    0   N/A  N/A     22108    C+G   ...5n1h2txyewy\ShellExperienceHost.exe      N/A      |
|    0   N/A  N/A     23296      C   ...gface\kohya_ss\Python310\python.exe      N/A      |
|    0   N/A  N/A     24012    C+G   ...137.0_x64__dt26b99r8h8gj\RtkUWP.exe      N/A      |
+-----------------------------------------------------------------------------------------+

(venv) D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo>python
Python 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(f"CUDA Available: {torch.cuda.is_available()}")
CUDA Available: True
>>> print(f"GPU Name: {torch.cuda.get_device_name(0)}")
GPU Name: NVIDIA GeForce GTX 1650
>>> print(f"PyTorch Version: {torch.__version__}")
PyTorch Version: 2.7.0+cu118
>>> print(f"CUDA Version: {torch.version.cuda}")
CUDA Version: 11.8

Any pointers?

P.S:

The GPU version was working before, but I cleaned my computer removing several apps possible some dlls and programs like Microsoft Visual Studio
I tried connecting with chatgpt and grok, but their suggestions made CPU work, but not the GPU
Logs don’t give any error

John6666 · May 14, 2025, 11:15am

The diffusion model tends to encounter various errors (such as incomplete errors) when using float16. As you mentioned in your commented-out code, using an optional VAE is likely to work properly.

github.com/huggingface/diffusers

Black image on SDXL 1.0 + RuntimeWarning: invalid value encountered in cast images = (images * 255).round().astype("uint8")

opened 07:19PM - 27 Jul 23 UTC

closed 03:03PM - 06 Sep 23 UTC

belfortf

bug stale

### Describe the bug I'm trying to run SDXL in a container environment (Debian)…. I tried both diffusers[torch]==0.18.0 and diffusers[torch]==0.19.0 with Python 3.10 on A10g and A100 but I get a black image back and RuntimeWarning: invalid value encountered in cast images = (images * 255).round().astype("uint8") when running 0.18.0. Basically, when building the container's image I'm doing: ``` pipe = diffusers.DiffusionPipeline.from_pretrained( model_id, use_auth_token=hugging_face_token, torch_dtype=torch.float16, use_safetensors=True, variant="fp16", cache_dir=cache_path ) pipe.save_pretrained(cache_path, safe_serialization=True) ``` After that, when running the function I'm calling: ``` self.pipe = diffusers.DiffusionPipeline.from_pretrained( cache_path).to("cuda") ``` With 0.19.0 I'm getting Out of memory errors. .enable_model_cpu_offload() didn't work for me. Anyway, with both versions in the end I get a black image. ### Reproduction ``` pipe = diffusers.DiffusionPipeline.from_pretrained( model_id, use_auth_token=hugging_face_token, torch_dtype=torch.float16, use_safetensors=True, variant="fp16", cache_dir=cache_path ) pipe.save_pretrained(cache_path, safe_serialization=True) pipe = diffusers.DiffusionPipeline.from_pretrained( cache_path).to("cuda") ``` ### Logs ```shell The config attributes {'force_upcast': True} were passed to AutoencoderKL, but are not expected and will be ignored. Please verify your config.json configuration file. 98%|█████████▊| 49/50 [00:10<00:00, 5.28it/s]100%|██████████| 50/50 [00:10<00:00, 5.27it/s]100%|██████████| 50/50 [00:10<00:00, 4.90it/s] /usr/local/lib/python3.10/site-packages/diffusers/image_processor.py:65: RuntimeWarning: invalid value encountered in cast images = (images * 255).round().astype("uint8") ``` ### System Info 0.18.0 and 0.19.0. Linux Debian (don't have access to the container now). ### Who can help? @patrickvonplaten

github.com/huggingface/diffusers

Blank black image output with stable diffusion 2.1 using autocast

opened 05:54PM - 08 Dec 22 UTC

closed 03:03PM - 19 Feb 23 UTC

fralumz

bug stale

### Describe the bug Using stable diffusion pipeline with torch.autocast and …the stabilityai/stable-diffusion-2-1 model, the images generate are all blank black images. ### Reproduction ```python import torch from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler model_id = "stabilityai/stable-diffusion-2-1" scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler") pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt, height=768, width=768).images[0] image.save("astronaut_rides_horse.png") # works fine with torch.autocast("cuda"): image = pipe(prompt, height=768, width=768).images[0] # generates blank image image.save("astronaut_rides_horse_autocast.png") ``` ### Logs ```shell Python 3.10.8 | packaged by conda-forge | (main, Nov 24 2022, 14:07:00) [MSC v.1916 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler >>> >>> model_id = "stabilityai/stable-diffusion-2-1" >>> >>> scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler") >>> pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16) Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 6015.50it/s] >>> pipe = pipe.to("cuda") >>> >>> prompt = "a photo of an astronaut riding a horse on mars" >>> >>> image = pipe(prompt, height=768, width=768).images[0] 100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:14<00:00, 3.57it/s] >>> >>> image.save("astronaut_rides_horse.png") # works fine >>> >>> with torch.autocast("cuda"): ... image = pipe(prompt, height=768, width=768).images[0] # generates blank image ... image.save("astronaut_rides_horse_autocast.png") ... 100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:12<00:00, 3.95it/s] ``` ### System Info - `diffusers` version: 0.10.0.dev0 - Platform: Windows-10-10.0.19044-SP0 - Python version: 3.10.8 - PyTorch version (GPU?): 1.13.0 (True) - Huggingface_hub version: 0.11.1 - Transformers version: 4.25.1 - Using GPU in script?: yes - Using distributed or parallel set-up in script?: no

github.com/huggingface/diffusers

generated black image in flux fill fp16

opened 05:43AM - 11 Jan 25 UTC

closed 08:10PM - 13 Feb 25 UTC

saeedkhanehgir

bug stale

### Describe the bug when I load flux fill in fp16. I get the black image as …generated image. ### Reproduction ![image05](https://github.com/user-attachments/assets/97483c5b-bb01-473a-b318-7379db8b2f1e) ![brush05](https://github.com/user-attachments/assets/01cf709c-a191-4f01-9c12-33457bf90fde) my inference code ``` import torch from diffusers import FluxFillPipeline from PIL import Image import cv2 import numpy as np def read_and_check_input_parameters(image_path, mask_path=None): image = cv2.imread(image_path) image = image[..., ::-1] # RGB mask = cv2.imread(mask_path, 0) return image, mask def crop_around_mask(mask, square_to_mask_ratio=2, r=2.5): # Get the coordinates of non-zero elements in the mask h, w = mask.shape xs, ys = np.where(mask) # Calculate the bounding box for the mask xmin, xmax = xs.min(), xs.max() ymin, ymax = ys.min(), ys.max() # Calculate the size of the bounding square for the mask mask_size = max(xmax - xmin, ymax - ymin) # Check the size if max(xmax - xmin, ymax - ymin) > min(h, w) / r: return mask, None, None # Calculate the distance of each side of the square from the center center_distance = int(square_to_mask_ratio * mask_size // 2) + 1 # Calculate the center of the mask center_x = xmin + (xmax - xmin) // 2 center_y = ymin + (ymax - ymin) // 2 # Determine the coordinates of the cropping rectangle y1, x1 = center_y - center_distance, center_x - center_distance y2, x2 = center_y + center_distance, center_x + center_distance # Ensure coordinates are within bounds of the mask if y1 < 0: y2 -= y1 y1 = 0 if x1 < 0: x2 -= x1 x1 = 0 if y2 > w: y1 -= y2 - w y2 = w if x2 > h: x1 -= x2 - h x2 = h # Ensure crop is square if x2 - x1 != y2 - y1: if (x2 - x1) < (y2 - y1): x1 += (y2 - y1) - (x2 - x1) else: y1 += (x2 - x1) - (y2 - x1) crop_image_points = (x1, y1, x2, y2) mask_points_in_cropped = (xmin - x1, ymin - y1, xmax - x1, ymax - y1) return mask[x1:x2, y1:y2], crop_image_points, mask_points_in_cropped def resize_and_pad(image: np.ndarray, mask=None, target_size=(1024, 1024)): height, width, _ = image.shape scale = min(target_size) / max(height, width) height = int(height * scale) width = int(width * scale) new_image = cv2.resize(image, (width, height), interpolation=cv2.INTER_LINEAR) pad_height = target_size[0] - height pad_width = target_size[1] - width top_pad = pad_height // 2 bottom_pad = pad_height - top_pad left_pad = pad_width // 2 right_pad = pad_width - left_pad new_image = np.pad( new_image, ((top_pad, bottom_pad), (left_pad, right_pad), (0, 0)), mode="constant", ) if mask is not None: new_mask = cv2.resize( mask.astype(np.uint8), (width, height), interpolation=cv2.INTER_LINEAR, ) new_mask = np.pad( new_mask, ((top_pad, bottom_pad), (left_pad, right_pad)), mode="constant", ) return new_image, new_mask, (top_pad, bottom_pad, left_pad, right_pad) return new_image, (top_pad, bottom_pad, left_pad, right_pad) model_path = "black-forest-labs/FLUX.1-Fill-dev" pipe = FluxFillPipeline.from_pretrained(model_path, torch_dtype=torch.float16).to("cuda") img_path = "image05.jpg" brush_path = "brush05.jpg" image, brush = read_and_check_input_parameters( img_path, brush_path ) brush = ((brush > 100) * 255).astype("uint8") original_image = image.copy() brush, pts, mask_pts = crop_around_mask(brush) if pts is not None: # Check whether it is cropped image = image[pts[0] : pts[2], pts[1] : pts[3]] image_padded, brush_padded, padding_factors = resize_and_pad( image, brush, [1024, 1024] ) image_padded = image_padded[:,:,::-1] image_padded = Image.fromarray(image_padded) brush_padded = Image.fromarray(brush_padded) flux_output_img = pipe( prompt="glass", image=image_padded, mask_image=brush_padded, height=1024, width=1024, guidance_scale=30, num_inference_steps=8, max_sequence_length=512, generator=torch.Generator("cpu").manual_seed(0) ).images[0] flux_output_img.save('result.png') ``` ### Logs ```shell Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.12s/it] Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.12s/it] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:11<00:00, 1.48s/it] /diffusers/src/diffusers/image_processor.py:147: RuntimeWarning: invalid value encountered in cast images = (images * 255).round().astype("uint8") ``` ### System Info cuda 12.1 diffusers==0.33.0.dev0 ( build from source) torch==2.4.1 torchvision==0.19.1 pillow==11.1.0 opencv-python==4.10.0.84 ### Who can help? _No response_

# Optional: Custom VAE (uncomment if needed)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16
).to("cuda")

try:
    pipeline = AutoPipelineForText2Image.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        variant="fp16",
        device_map="auto"
    )#.to("cuda")
    logging.info("Pipeline loaded to GPU with float16.")
except Exception as e:
    logging.error(f"Failed to load model pipeline: {e}")
    raise

# If using VAE:
pipeline.vae = vae

deicool · May 15, 2025, 4:27am

Hello

I am getting the following error after modifying the code:


(venv) D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo>python John-Training-12thFeb2025-original.py
2025-05-15 09:56:28,925 - INFO - Initializing...
2025-05-15 09:56:28,951 - INFO - Using seed: 1421589
2025-05-15 09:56:28,951 - INFO - Loading base model and LoRA weights...
2025-05-15 09:56:29,970 - ERROR - Failed to load model pipeline: auto not supported. Supported strategies are: balanced
Traceback (most recent call last):
  File "D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo\John-Training-12thFeb2025-original.py", line 52, in <module>
    pipeline = AutoPipelineForText2Image.from_pretrained(
  File "D:\Ganu\AIImage\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "D:\Ganu\AIImage\venv\lib\site-packages\diffusers\pipelines\auto_pipeline.py", line 443, in from_pretrained
    return text_2_image_cls.from_pretrained(pretrained_model_or_path, **kwargs)
  File "D:\Ganu\AIImage\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "D:\Ganu\AIImage\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 745, in from_pretrained
    raise NotImplementedError(
NotImplementedError: auto not supported. Supported strategies are: balanced

The program is as follows:

import logging
from diffusers import AutoPipelineForText2Image, AutoencoderKL
import torch
import numpy as np
import random
import os
from PIL import Image

# =========================
# STEP 0: Logging Setup
# =========================
log_file = "generation_log.txt"
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(log_file),
        logging.StreamHandler()
    ]
)

logging.info("Initializing...")

# =========================
# STEP 1: Environment Setup
# =========================

torch.cuda.empty_cache()
torch.cuda.ipc_collect()

seed = random.randint(0, 9999999)
torch.manual_seed(seed)
np.random.seed(seed)
logging.info(f"Using seed: {seed}")

# ===============================
# STEP 2: Model and LoRA Setup
# ===============================
logging.info("Loading base model and LoRA weights...")

model_dir = "D:\\Ganu\\AIImage\\huggingface\\kohya_ss\\kohya_ss\\outputs"
lora_weights_path = os.path.join(model_dir, "model")
model_id = "stabilityai/stable-diffusion-xl-base-1.0"

# Optional: Custom VAE (uncomment if needed)
vae = AutoencoderKL.from_pretrained(
     "madebyollin/sdxl-vae-fp16-fix",
     torch_dtype=torch.float16
 ).to("cuda")

try:
    pipeline = AutoPipelineForText2Image.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        variant="fp16",
        device_map="auto"
    ).to("cuda")
    logging.info("Pipeline loaded to GPU with float16.")
except Exception as e:
    logging.error(f"Failed to load model pipeline: {e}")
    raise

# If using VAE:
pipeline.vae = vae

pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()

try:
    pipeline.load_lora_weights(lora_weights_path, weight_name="last.safetensors")
    logging.info("LoRA weights loaded successfully.")
except ValueError as e:
    logging.error("Invalid LoRA checkpoint. Check the format or compatibility.")
    raise e

# =========================
# STEP 3: Prompt Inference
# =========================
text_prompt = (
    "A wide, breathtaking landscape with all real vibrant nature-themed background, lush forests, mountains, and a Doctor standing prominently in the foreground"
)

negative_prompt = (
    "text, letters, words, signage, logos, labels, writing, messy background, busy layout, clutter, double faces, abstract shapes, UI panels with words, overlapping elements, header, footer, top bar, navigation bar, bottom menu, toolbar, top text, website layout, browser frame, button row, page border, UI bar"
)

logging.info(f"Running inference with prompt: {text_prompt}")

try:
    result = pipeline(
        prompt=text_prompt,
        negative_prompt=negative_prompt,
        guidance_scale=7.5,
        num_inference_steps=30
    )
    generated_image = result.images[0]
    output_path = f"generated_image_{seed}.png"
    generated_image.save(output_path)
    logging.info(f"Image saved to: {output_path}")
    generated_image.show()
except Exception as e:
    logging.error(f"Error during image generation: {e}")
    raise

And i get the same error with “balanced” too!

John6666 · May 15, 2025, 5:08am

Hmm… So simply this?

# Optional: Custom VAE (uncomment if needed)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16
).to("cuda")

try:
    pipeline = AutoPipelineForText2Image.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        variant="fp16",
        #device_map="auto"
    ).to("cuda")
    logging.info("Pipeline loaded to GPU with float16.")
except Exception as e:
    logging.error(f"Failed to load model pipeline: {e}")
    raise

# pipeline.enable_model_cpu_offload()

# If using VAE:
pipeline.vae = vae

github.com/huggingface/diffusers

Problem with Flux Schnell bfloat16 multiGPU

opened 06:30AM - 16 Aug 24 UTC

closed 03:04PM - 15 Sep 24 UTC

OlegRuban-ai

bug

### Describe the bug Hello! I set device_map='balanced' and get images genera…ted in 2.5 minutes (expected in 12-20 seconds), while in pipe.hf_device_map it shows that the devices are distributed like this: ``` { "transformer": "cuda:0", "text_encoder_2": "cuda:2", "text_encoder": "cuda:0", "vae": "cuda:1" } ``` I have 3 video cards 3090 Ti 24 GB and I can’t run it on them. I also tried this way: pipe.transformer.to('cuda:2') pipe.text_encoder.to('cuda:2') pipe.text_encoder_2.to('cuda:1') pipe.vae.to('cuda:0') What is the best way to launch it so that generation occurs on the GPU and quickly? ### Reproduction ```python pipe = FluxPipeline.from_pretrained( path_chkpt, torch_dtype=torch.bfloat16, device_map='balanced', ) ``` ### Logs _No response_ ### System Info ubuntu 22.04 3 GPU: 3090 TI 24 GB accelerate==0.30.1 addict==2.4.0 apscheduler==3.9.1 autocorrect==2.5.0 chardet==4.0.0 cryptography==37.0.2 curl_cffi diffusers==0.30.0 beautifulsoup4==4.11.2 einops facexlib>=0.2.5 fastapi==0.92.0 hidiffusion==0.1.6 invisible-watermark>=0.2.0 numpy==1.24.3 opencv-python==4.8.0.74 pandas==2.0.3 pycocotools==2.0.6 pymystem3==0.2.0 pyyaml==6.0 pyjwt==2.6.0 python-multipart==0.0.5 pytrends==4.9.1 psycopg2-binary realesrgan==0.3.0 redis==4.5.1 sacremoses==0.0.53 selenium==4.2.0 sentencepiece==0.1.97 scipy==1.10.1 scikit-learn==0.24.1 supervision==0.16.0 tb-nightly==2.14.0a20230629 tensorboard>=2.13.0 tomesd transformers==4.40.1 timm==0.9.16 yapf==0.32.0 uvicorn==0.20.0 spacy==3.7.2 nest_asyncio==1.5.8 httpx==0.25.0 torchvision==0.15.2 insightface==0.7.3 psutil==5.9.6 tk==0.1.0 customtkinter==5.2.1 tensorflow==2.13.0 opennsfw2==0.10.2 protobuf==4.24.4 gfpgan==1.3.8 ### Who can help? _No response_

deicool · May 15, 2025, 5:53am

still getting a plain black image, but no errors

import logging
from diffusers import AutoPipelineForText2Image, AutoencoderKL
import torch
import numpy as np
import random
import os
from PIL import Image

# =========================
# STEP 0: Logging Setup
# =========================
log_file = "generation_log.txt"
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(log_file),
        logging.StreamHandler()
    ]
)

logging.info("Initializing...")

# =========================
# STEP 1: Environment Setup
# =========================

torch.cuda.empty_cache()
torch.cuda.ipc_collect()

seed = random.randint(0, 9999999)
torch.manual_seed(seed)
np.random.seed(seed)
logging.info(f"Using seed: {seed}")

# ===============================
# STEP 2: Model and LoRA Setup
# ===============================
logging.info("Loading base model and LoRA weights...")

model_dir = "D:\\Ganu\\AIImage\\huggingface\\kohya_ss\\kohya_ss\\outputs"
lora_weights_path = os.path.join(model_dir, "model")
model_id = "stabilityai/stable-diffusion-xl-base-1.0"

# Optional: Custom VAE (uncomment if needed)
vae = AutoencoderKL.from_pretrained(
     "madebyollin/sdxl-vae-fp16-fix",
     torch_dtype=torch.float16
 ).to("cuda")

try:
    pipeline = AutoPipelineForText2Image.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        variant="fp16",
        #device_map="auto"
    ).to("cuda")
    logging.info("Pipeline loaded to GPU with float16.")
except Exception as e:
    logging.error(f"Failed to load model pipeline: {e}")
    raise

# pipeline.enable_model_cpu_offload()

# If using VAE:
pipeline.vae = vae

pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()

try:
    pipeline.load_lora_weights(lora_weights_path, weight_name="last.safetensors")
    logging.info("LoRA weights loaded successfully.")
except ValueError as e:
    logging.error("Invalid LoRA checkpoint. Check the format or compatibility.")
    raise e

# =========================
# STEP 3: Prompt Inference
# =========================
text_prompt = (
    "A wide, breathtaking landscape with all real vibrant nature-themed background, lush forests, mountains, and a Doctor standing prominently in the foreground"
)

negative_prompt = (
    "text, letters, words, signage, logos, labels, writing, messy background, busy layout, clutter, double faces, abstract shapes, UI panels with words, overlapping elements, header, footer, top bar, navigation bar, bottom menu, toolbar, top text, website layout, browser frame, button row, page border, UI bar"
)

logging.info(f"Running inference with prompt: {text_prompt}")

try:
    result = pipeline(
        prompt=text_prompt,
        negative_prompt=negative_prompt,
        guidance_scale=7.5,
        num_inference_steps=30
    )
    generated_image = result.images[0]
    output_path = f"generated_image_{seed}.png"
    generated_image.save(output_path)
    logging.info(f"Image saved to: {output_path}")
    generated_image.show()
except Exception as e:
    logging.error(f"Error during image generation: {e}")
    raise

John6666 · May 15, 2025, 6:15am

Hmm… Perhaps LoRA loading issue…?

import logging
from diffusers import AutoPipelineForText2Image, AutoencoderKL
import torch
import numpy as np
import random
import os
from PIL import Image

# =========================
# STEP 0: Logging Setup
# =========================
log_file = "generation_log.txt"
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(log_file),
        logging.StreamHandler()
    ]
)

logging.info("Initializing...")

# =========================
# STEP 1: Environment Setup
# =========================

torch.cuda.empty_cache()
torch.cuda.ipc_collect()

seed = random.randint(0, 9999999)
torch.manual_seed(seed)
np.random.seed(seed)
logging.info(f"Using seed: {seed}")

# ===============================
# STEP 2: Model and LoRA Setup
# ===============================
logging.info("Loading base model and LoRA weights...")

model_dir = "D:\\Ganu\\AIImage\\huggingface\\kohya_ss\\kohya_ss\\outputs"
lora_weights_path = os.path.join(model_dir, "model")
model_id = "stabilityai/stable-diffusion-xl-base-1.0"

# Optional: Custom VAE (uncomment if needed)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16
).to("cuda")

try:
    pipeline = AutoPipelineForText2Image.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        variant="fp16",
    ).to("cuda")
    logging.info("Pipeline loaded to GPU with float16.")
except Exception as e:
    logging.error(f"Failed to load model pipeline: {e}")
    raise

#pipeline.enable_model_cpu_offload()

# If using VAE:
pipeline.vae = vae

pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()

"""try:
    pipeline.load_lora_weights(lora_weights_path, weight_name="last.safetensors")
    logging.info("LoRA weights loaded successfully.")
except ValueError as e:
    logging.error("Invalid LoRA checkpoint. Check the format or compatibility.")
    raise e
"""

# =========================
# STEP 3: Prompt Inference
# =========================
text_prompt = (
    "A wide, breathtaking landscape with all real vibrant nature-themed background, lush forests, mountains, and a Doctor standing prominently in the foreground"
)

negative_prompt = (
    "text, letters, words, signage, logos, labels, writing, messy background, busy layout, clutter, double faces, abstract shapes, UI panels with words, overlapping elements, header, footer, top bar, navigation bar, bottom menu, toolbar, top text, website layout, browser frame, button row, page border, UI bar"
)

logging.info(f"Running inference with prompt: {text_prompt}")

try:
    result = pipeline(
        prompt=text_prompt,
        negative_prompt=negative_prompt,
        guidance_scale=7.5,
        num_inference_steps=30
    )
    generated_image = result.images[0]
    output_path = f"generated_image_{seed}.png"
    generated_image.save(output_path)
    logging.info(f"Image saved to: {output_path}")
    generated_image.show()
except Exception as e:
    logging.error(f"Error during image generation: {e}")
    raise

deicool · May 15, 2025, 9:06am

Hello

Getting same black image without LORA

Python & Torch Details:

(venv) D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo>python
Python 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>>
>>> # Check if torch was built with CUDA
>>> print("Is CUDA available? :", torch.cuda.is_available())
Is CUDA available? : True
>>> print("CUDA version (torch compiled with):", torch.version.cuda)
CUDA version (torch compiled with): 11.8
>>> print("Torch built with CUDA support:", torch.backends.cuda.is_built())
Torch built with CUDA support: True

Code:

import logging
from diffusers import AutoPipelineForText2Image, AutoencoderKL
import torch
import numpy as np
import random
import os
from PIL import Image

# =========================
# STEP 0: Logging Setup
# =========================
log_file = "generation_log.txt"
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(log_file),
        logging.StreamHandler()
    ]
)

logging.info("Initializing...")

# =========================
# STEP 1: Environment Setup
# =========================

torch.cuda.empty_cache()
torch.cuda.ipc_collect()

seed = random.randint(0, 9999999)
torch.manual_seed(seed)
np.random.seed(seed)
logging.info(f"Using seed: {seed}")

# ===============================
# STEP 2: Model and LoRA Setup
# ===============================
logging.info("Loading base model and LoRA weights...")

model_dir = "D:\\Ganu\\AIImage\\huggingface\\kohya_ss\\kohya_ss\\outputs"
lora_weights_path = os.path.join(model_dir, "model")
model_id = "stabilityai/stable-diffusion-xl-base-1.0"

# Optional: Custom VAE (uncomment if needed)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16
).to("cuda")

try:
    pipeline = AutoPipelineForText2Image.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        variant="fp16",
    ).to("cuda")
    logging.info("Pipeline loaded to GPU with float16.")
except Exception as e:
    logging.error(f"Failed to load model pipeline: {e}")
    raise

#pipeline.enable_model_cpu_offload()

# If using VAE:
pipeline.vae = vae

pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()

"""try:
    pipeline.load_lora_weights(lora_weights_path, weight_name="last.safetensors")
    logging.info("LoRA weights loaded successfully.")
except ValueError as e:
    logging.error("Invalid LoRA checkpoint. Check the format or compatibility.")
    raise e
"""

# =========================
# STEP 3: Prompt Inference
# =========================
text_prompt = (
    "A wide, breathtaking landscape with all real vibrant nature-themed background, lush forests, mountains, and a Doctor standing prominently in the foreground"
)

negative_prompt = (
    "text, letters, words, signage, logos, labels, writing, messy background, busy layout, clutter, double faces, abstract shapes, UI panels with words, overlapping elements, header, footer, top bar, navigation bar, bottom menu, toolbar, top text, website layout, browser frame, button row, page border, UI bar"
)

logging.info(f"Running inference with prompt: {text_prompt}")

try:
    result = pipeline(
        prompt=text_prompt,
        negative_prompt=negative_prompt,
        guidance_scale=7.5,
        num_inference_steps=30
    )
    generated_image = result.images[0]
    output_path = f"generated_image_{seed}.png"
    generated_image.save(output_path)
    logging.info(f"Image saved to: {output_path}")
    generated_image.show()
except Exception as e:
    logging.error(f"Error during image generation: {e}")
    raise

Environment:

1. Libraries

pip list
Package            Version
------------------ ------------------
accelerate         0.21.0
aiofiles           24.1.0
annotated-types    0.7.0
anyio              4.9.0
certifi            2025.1.31
charset-normalizer 3.4.1
click              8.1.8
colorama           0.4.6
deepspeed          0.10.0+f5c834a6
diffusers          0.21.4
exceptiongroup     1.2.2
fastapi            0.115.12
ffmpy              0.5.0
filelock           3.18.0
flash-attention    1.0.0
fsspec             2025.3.2
gradio             5.27.1
gradio_client      1.9.1
groovy             0.1.2
h11                0.16.0
hjson              3.1.0
httpcore           1.0.9
httpx              0.28.1
huggingface-hub    0.16.4
idna               3.10
importlib_metadata 8.6.1
Jinja2             3.1.6
markdown-it-py     3.0.0
MarkupSafe         3.0.2
mdurl              0.1.2
mpmath             1.3.0
mypy_extensions    1.1.0
networkx           3.4.2
ninja              1.11.1.4
numpy              1.23.1
orjson             3.10.16
packaging          25.0
pandas             2.2.3
peft               0.15.2
pillow             11.2.1
pip                25.1.1
psutil             7.0.0
py-cpuinfo         9.0.0
pydantic           1.10.13
pydantic_core      2.33.1
pydub              0.25.1
Pygments           2.19.1
pyre-extensions    0.0.29
python-dateutil    2.9.0.post0
python-multipart   0.0.20
pytz               2025.2
PyYAML             6.0.2
regex              2024.11.6
requests           2.32.3
rich               14.0.0
ruff               0.11.7
safehttpx          0.1.6
safetensors        0.5.3
semantic-version   2.10.0
setuptools         65.5.0
shellingham        1.5.4
six                1.17.0
sniffio            1.3.1
starlette          0.46.2
sympy              1.14.0
tokenizers         0.13.3
tomlkit            0.13.2
torch              2.4.0+cu118
torchaudio         2.7.0+cu118
torchvision        0.22.0+cu118
tqdm               4.67.1
transformers       4.31.0
typer              0.15.3
typing_extensions  4.13.2
typing-inspect     0.9.0
typing-inspection  0.4.0
tzdata             2025.2
urllib3            2.4.0
uvicorn            0.34.2
websockets         15.0.1
xformers           0.0.27.post2+cu118
zipp               3.21.0

nvdia-sim output

(venv) D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo>nvidia-smi
Thu May 15 14:36:50 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 576.40                 Driver Version: 576.40         CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650      WDDM  |   00000000:01:00.0  On |                  N/A |
| N/A   47C    P8              5W /   50W |     698MiB /   4096MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            5648    C+G   ...IA app\CEF\NVIDIA Overlay.exe      N/A      |
|    0   N/A  N/A            6664    C+G   ...we\Microsoft.Media.Player.exe      N/A      |
|    0   N/A  N/A            7348    C+G   ...Chrome\Application\chrome.exe      N/A      |
|    0   N/A  N/A            7560    C+G   ...Chrome\Application\chrome.exe      N/A      |
|    0   N/A  N/A            7940    C+G   ....0.3240.64\msedgewebview2.exe      N/A      |
|    0   N/A  N/A            9104    C+G   C:\Windows\explorer.exe               N/A      |
|    0   N/A  N/A            9520    C+G   ...h_cw5n1h2txyewy\SearchApp.exe      N/A      |
|    0   N/A  N/A            9728    C+G   ...ntrolPanel\SystemSettings.exe      N/A      |
|    0   N/A  N/A           11968    C+G   ...h_cw5n1h2txyewy\SearchApp.exe      N/A      |
|    0   N/A  N/A           14064    C+G   ...5n1h2txyewy\TextInputHost.exe      N/A      |
|    0   N/A  N/A           15292    C+G   ...IA app\CEF\NVIDIA Overlay.exe      N/A      |
+-----------------------------------------------------------------------------------------+

John6666 · May 15, 2025, 9:12am

Hmm… Or CUDA Toolkit version issue?

accelerate                1.0.1
diffusers                 0.32.2
torch                     2.4.0+cu124
transformers              4.49.0.dev0

deicool · May 15, 2025, 9:29am

The simple code

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float32
).to("cuda")

prompt = "A clear sunny landscape with mountains and a river"
image = pipe(prompt=prompt).images[0]
image.save("test_image.png")

returns the error

(venv) D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo>python John-Training-15thMay2025.py
[2025-05-15 14:53:54,560] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
D:\Ganu\AIImage\venv\lib\site-packages\deepspeed-0.10.0+f5c834a6-py3.10.egg\deepspeed\runtime\zero\linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, weight, bias=None):
D:\Ganu\AIImage\venv\lib\site-packages\deepspeed-0.10.0+f5c834a6-py3.10.egg\deepspeed\runtime\zero\linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:02<00:00,  2.48it/s]
Traceback (most recent call last):
  File "D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo\John-Training-15thMay2025.py", line 7, in <module>
    ).to("cuda")
  File "D:\Ganu\AIImage\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 733, in to
    module.to(torch_device, torch_dtype)
  File "D:\Ganu\AIImage\venv\lib\site-packages\transformers\modeling_utils.py", line 1900, in to
    return super().to(*args, **kwargs)
  File "D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py", line 1174, in to
    return self._apply(convert)
  File "D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  File "D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  File "D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  File "D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py", line 805, in _apply
    param_applied = fn(param)
  File "D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py", line 1160, in convert
    return t.to(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 242.00 MiB. GPU 0 has a total capacity of 4.00 GiB of which 0 bytes is free. Of the allocated memory 10.41 GiB is allocated by PyTorch, and 262.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

and chatgpt says

You're running out of VRAM on your GPU (4 GB total, 0 bytes free), and you're trying to load Stable Diffusion XL (SDXL), which requires at least 12 GB of GPU memory, ideally more. The error is expected — SDXL is far too heavy for a 4 GB GPU.

P.s:

The cpu version still works.
And I remember the gpu version working earlier
Do you want me to change the ( CUDA Toolkit version )

John6666 · May 15, 2025, 10:17am

Upgrading the CUDA toolkit is a last resort, so I’ll try these first. Also, the above error may simply be due to insufficient VRAM…

pip install -U accelerate
pip install diffusers==0.32.2 transformers<=4.48.3

deicool · May 15, 2025, 10:40am

pip install diffusers==0.32.2 transformers<=4.48.3
The system cannot find the file specified.

John6666 · May 15, 2025, 10:43am

Hmm…

pip install diffusers==0.32.2
pip install transformers==4.48.3

deicool · May 15, 2025, 11:15am

(venv) D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo>python John-Training-15thMay2025.py
WARNING[XFORMERS]: xFormers can’t load C++/CUDA extensions. xFormers was built for:
PyTorch 2.4.0+cu118 with CUDA 1108 (you have 2.7.0+cu118)
Python 3.10.11 (you have 3.10.10)
Please reinstall xformers (see GitHub - facebookresearch/xformers: Hackable and optimized Transformers building blocks, supporting a composable construction.)
Memory-efficient attention, SwiGLU, sparse and more won’t be available.
Set XFORMERS_MORE_DETAILS=1 for more details
Loading pipeline components…: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:02<00:00, 3.32it/s]
Traceback (most recent call last):
File “D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo\John-Training-15thMay2025.py”, line 7, in
).to(“cuda”)
File “D:\Ganu\AIImage\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py”, line 461, in to
module.to(device, dtype)
File “D:\Ganu\AIImage\venv\lib\site-packages\transformers\modeling_utils.py”, line 3110, in to
return super().to(*args, **kwargs)
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 1355, in to
return self._apply(convert)
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 915, in _apply
module._apply(fn)
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 915, in _apply
module._apply(fn)
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 915, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 942, in _apply
param_applied = fn(param)
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 1341, in convert
return t.to(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 4.00 GiB of which 0 bytes is free. Of the allocated memory 10.66 GiB is allocated by PyTorch, and 239.73 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (CUDA semantics — PyTorch 2.7 documentation)

John6666 · May 15, 2025, 11:19am

WARNING[XFORMERS]: xFormers can’t load C++/CUDA extensions. xFormers was built for:

Oh…

pip uninstall xformers

deicool · May 15, 2025, 11:21am

(venv) D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo>python John-Training-15thMay2025.py
Loading pipeline components…: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:02<00:00, 3.35it/s]
Traceback (most recent call last):
File “D:\Ganu\AIImage\project\Train-10Images-chatgptParameters\runs\1sstrun-23thApril2025\generation\1stGo\John-Training-15thMay2025.py”, line 7, in
).to(“cuda”)
File “D:\Ganu\AIImage\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py”, line 461, in to
module.to(device, dtype)
File “D:\Ganu\AIImage\venv\lib\site-packages\diffusers\models\modeling_utils.py”, line 1077, in to
return super().to(*args, **kwargs)
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 1355, in to
return self._apply(convert)
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 915, in _apply
module._apply(fn)
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 915, in _apply
module._apply(fn)
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 915, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 942, in _apply
param_applied = fn(param)
File “D:\Ganu\AIImage\venv\lib\site-packages\torch\nn\modules\module.py”, line 1341, in convert
return t.to(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.00 MiB. GPU 0 has a total capacity of 4.00 GiB of which 0 bytes is free. Of the allocated memory 10.67 GiB is allocated by PyTorch, and 229.42 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (CUDA semantics — PyTorch 2.7 documentation)

John6666 · May 15, 2025, 11:41am

Hmm…

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
   #torch_dtype=torch.float32 # too large
    torch_dtype=torch.float16
).to("cuda")

prompt = "A clear sunny landscape with mountains and a river"
image = pipe(prompt=prompt).images[0]
image.save("test_image.png")

deicool · May 16, 2025, 5:08am

Still getting a plain black image with the following code

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
   #torch_dtype=torch.float32 # too large
    torch_dtype=torch.float16
).to("cuda")

prompt = "A clear sunny landscape with mountains and a river"
image = pipe(prompt=prompt).images[0]
image.save("test_image.png")

John6666 · May 16, 2025, 6:19am

I found an old 10x0 related issue…

from diffusers import DiffusionPipeline, AutoencoderKL
import torch

torch.backends.cudnn.benchmark = True # https://github.com/huggingface/diffusers/issues/1556
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", vae=vae, variant="fp16", torch_dtype=torch.float16).to("cuda")

prompt = "A clear sunny landscape with mountains and a river"
image = pipe(prompt=prompt).images[0]
image.save("test_image.png")

github.com/huggingface/diffusers

Images are black when only using GPU (CUDA 11.7) along with NSFW warning

opened 05:31PM - 05 Dec 22 UTC

closed 04:31PM - 06 Dec 22 UTC

1MochaChan1

bug

### Describe the bug I am new to AI in general, and I am trying to use the diff…users package by following the hugging face tutorial for running the model online and locally. But both of the methods (GPU) gives me a black image in return. However when I use the CPU mode `pipe.to("cpu")`, I get the results as expected. In this case I am using the `CompVis/stable-diffusion-v1-4` model that I got from using the following snippet: > ``` > git lfs install > git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 > ``` ### Reproduction ``` from diffusers import StableDiffusionPipeline import torch pipe = StableDiffusionPipeline.from_pretrained( "D:\\4_Projects\\9_AI\\stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, ) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" pipe.enable_attention_slicing() image = pipe(prompt).images[0] image.save(f"{prompt}.png") ``` ## The fixes I tried: 1. Editing the safety_checker.py > Editing the line `has_nsfw_concepts = [len(res["bad_concepts"]) > 0 for res in result]` to `has_nsfw_concepts = [False for res in result]` 2. Using a different seed (1024, 3214) 3. Using a different scheduler (`LMSDiscreteScheduler`) ### Logs ```shell The following is the only things I got in the terminal using the local model. 100%|██████████████████████████████████████████████████████████████████████████| 51/51 [04:32<00:00, 5.33s/it] Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed. PS D:\4_Projects\9_AI\text_to_image_python> ``` ### System Info - `diffusers` version: 0.9.0 - Platform: Windows-10-10.0.19044-SP0 - Python version: 3.10.8 - PyTorch version (GPU?): 1.13.0 (True) - Huggingface_hub version: 0.11.1 - Transformers version: 4.25.1 - Using GPU in script?: yes - Using distributed or parallel set-up in script?: I do not understand what distributed or parallel setup could be in my script. However I've read an article where **accelerate** is used for distributed setup, and I do have **accelerate** package installed.

deicool · May 16, 2025, 8:55am

A green blank image is being created with the above code…

John6666 · May 16, 2025, 9:13am

Green…? It’s surreal, but it’s good to see a change. That this might be a CUDA Toolkit version issue. I’ll do some searching.

Topic		Replies	Views
Image cannot run correctly on stable-diffusion-2-1 🧨 Diffusers	0	354	July 12, 2023
StableDiffusionInpaintPipeline 'NoneType' is not iterable 🧨 Diffusers	1	54	April 24, 2025
Colab VAE returning low-quality image Beginners	3	64	September 25, 2024
Pipe.to("cuda") not working Beginners	1	3676	April 27, 2024
Model cannot backward on GPU but can backward on CPU 🧨 Diffusers	1	196	May 28, 2024

Program not working on GPU but works on CPU

Related topics