Program not working on GPU but works on CPU

John6666 · May 14, 2025, 11:15am

The diffusion model tends to encounter various errors (such as incomplete errors) when using float16. As you mentioned in your commented-out code, using an optional VAE is likely to work properly.

github.com/huggingface/diffusers

Black image on SDXL 1.0 + RuntimeWarning: invalid value encountered in cast images = (images * 255).round().astype("uint8")

opened 07:19PM - 27 Jul 23 UTC

closed 03:03PM - 06 Sep 23 UTC

belfortf

bug stale

### Describe the bug I'm trying to run SDXL in a container environment (Debian)…. I tried both diffusers[torch]==0.18.0 and diffusers[torch]==0.19.0 with Python 3.10 on A10g and A100 but I get a black image back and RuntimeWarning: invalid value encountered in cast images = (images * 255).round().astype("uint8") when running 0.18.0. Basically, when building the container's image I'm doing: ``` pipe = diffusers.DiffusionPipeline.from_pretrained( model_id, use_auth_token=hugging_face_token, torch_dtype=torch.float16, use_safetensors=True, variant="fp16", cache_dir=cache_path ) pipe.save_pretrained(cache_path, safe_serialization=True) ``` After that, when running the function I'm calling: ``` self.pipe = diffusers.DiffusionPipeline.from_pretrained( cache_path).to("cuda") ``` With 0.19.0 I'm getting Out of memory errors. .enable_model_cpu_offload() didn't work for me. Anyway, with both versions in the end I get a black image. ### Reproduction ``` pipe = diffusers.DiffusionPipeline.from_pretrained( model_id, use_auth_token=hugging_face_token, torch_dtype=torch.float16, use_safetensors=True, variant="fp16", cache_dir=cache_path ) pipe.save_pretrained(cache_path, safe_serialization=True) pipe = diffusers.DiffusionPipeline.from_pretrained( cache_path).to("cuda") ``` ### Logs ```shell The config attributes {'force_upcast': True} were passed to AutoencoderKL, but are not expected and will be ignored. Please verify your config.json configuration file. 98%|█████████▊| 49/50 [00:10<00:00, 5.28it/s]100%|██████████| 50/50 [00:10<00:00, 5.27it/s]100%|██████████| 50/50 [00:10<00:00, 4.90it/s] /usr/local/lib/python3.10/site-packages/diffusers/image_processor.py:65: RuntimeWarning: invalid value encountered in cast images = (images * 255).round().astype("uint8") ``` ### System Info 0.18.0 and 0.19.0. Linux Debian (don't have access to the container now). ### Who can help? @patrickvonplaten

github.com/huggingface/diffusers

Blank black image output with stable diffusion 2.1 using autocast

opened 05:54PM - 08 Dec 22 UTC

closed 03:03PM - 19 Feb 23 UTC

fralumz

bug stale

### Describe the bug Using stable diffusion pipeline with torch.autocast and …the stabilityai/stable-diffusion-2-1 model, the images generate are all blank black images. ### Reproduction ```python import torch from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler model_id = "stabilityai/stable-diffusion-2-1" scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler") pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt, height=768, width=768).images[0] image.save("astronaut_rides_horse.png") # works fine with torch.autocast("cuda"): image = pipe(prompt, height=768, width=768).images[0] # generates blank image image.save("astronaut_rides_horse_autocast.png") ``` ### Logs ```shell Python 3.10.8 | packaged by conda-forge | (main, Nov 24 2022, 14:07:00) [MSC v.1916 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler >>> >>> model_id = "stabilityai/stable-diffusion-2-1" >>> >>> scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler") >>> pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16) Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 6015.50it/s] >>> pipe = pipe.to("cuda") >>> >>> prompt = "a photo of an astronaut riding a horse on mars" >>> >>> image = pipe(prompt, height=768, width=768).images[0] 100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:14<00:00, 3.57it/s] >>> >>> image.save("astronaut_rides_horse.png") # works fine >>> >>> with torch.autocast("cuda"): ... image = pipe(prompt, height=768, width=768).images[0] # generates blank image ... image.save("astronaut_rides_horse_autocast.png") ... 100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:12<00:00, 3.95it/s] ``` ### System Info - `diffusers` version: 0.10.0.dev0 - Platform: Windows-10-10.0.19044-SP0 - Python version: 3.10.8 - PyTorch version (GPU?): 1.13.0 (True) - Huggingface_hub version: 0.11.1 - Transformers version: 4.25.1 - Using GPU in script?: yes - Using distributed or parallel set-up in script?: no

github.com/huggingface/diffusers

generated black image in flux fill fp16

opened 05:43AM - 11 Jan 25 UTC

closed 08:10PM - 13 Feb 25 UTC

saeedkhanehgir

bug stale

### Describe the bug when I load flux fill in fp16. I get the black image as …generated image. ### Reproduction ![image05](https://github.com/user-attachments/assets/97483c5b-bb01-473a-b318-7379db8b2f1e) ![brush05](https://github.com/user-attachments/assets/01cf709c-a191-4f01-9c12-33457bf90fde) my inference code ``` import torch from diffusers import FluxFillPipeline from PIL import Image import cv2 import numpy as np def read_and_check_input_parameters(image_path, mask_path=None): image = cv2.imread(image_path) image = image[..., ::-1] # RGB mask = cv2.imread(mask_path, 0) return image, mask def crop_around_mask(mask, square_to_mask_ratio=2, r=2.5): # Get the coordinates of non-zero elements in the mask h, w = mask.shape xs, ys = np.where(mask) # Calculate the bounding box for the mask xmin, xmax = xs.min(), xs.max() ymin, ymax = ys.min(), ys.max() # Calculate the size of the bounding square for the mask mask_size = max(xmax - xmin, ymax - ymin) # Check the size if max(xmax - xmin, ymax - ymin) > min(h, w) / r: return mask, None, None # Calculate the distance of each side of the square from the center center_distance = int(square_to_mask_ratio * mask_size // 2) + 1 # Calculate the center of the mask center_x = xmin + (xmax - xmin) // 2 center_y = ymin + (ymax - ymin) // 2 # Determine the coordinates of the cropping rectangle y1, x1 = center_y - center_distance, center_x - center_distance y2, x2 = center_y + center_distance, center_x + center_distance # Ensure coordinates are within bounds of the mask if y1 < 0: y2 -= y1 y1 = 0 if x1 < 0: x2 -= x1 x1 = 0 if y2 > w: y1 -= y2 - w y2 = w if x2 > h: x1 -= x2 - h x2 = h # Ensure crop is square if x2 - x1 != y2 - y1: if (x2 - x1) < (y2 - y1): x1 += (y2 - y1) - (x2 - x1) else: y1 += (x2 - x1) - (y2 - x1) crop_image_points = (x1, y1, x2, y2) mask_points_in_cropped = (xmin - x1, ymin - y1, xmax - x1, ymax - y1) return mask[x1:x2, y1:y2], crop_image_points, mask_points_in_cropped def resize_and_pad(image: np.ndarray, mask=None, target_size=(1024, 1024)): height, width, _ = image.shape scale = min(target_size) / max(height, width) height = int(height * scale) width = int(width * scale) new_image = cv2.resize(image, (width, height), interpolation=cv2.INTER_LINEAR) pad_height = target_size[0] - height pad_width = target_size[1] - width top_pad = pad_height // 2 bottom_pad = pad_height - top_pad left_pad = pad_width // 2 right_pad = pad_width - left_pad new_image = np.pad( new_image, ((top_pad, bottom_pad), (left_pad, right_pad), (0, 0)), mode="constant", ) if mask is not None: new_mask = cv2.resize( mask.astype(np.uint8), (width, height), interpolation=cv2.INTER_LINEAR, ) new_mask = np.pad( new_mask, ((top_pad, bottom_pad), (left_pad, right_pad)), mode="constant", ) return new_image, new_mask, (top_pad, bottom_pad, left_pad, right_pad) return new_image, (top_pad, bottom_pad, left_pad, right_pad) model_path = "black-forest-labs/FLUX.1-Fill-dev" pipe = FluxFillPipeline.from_pretrained(model_path, torch_dtype=torch.float16).to("cuda") img_path = "image05.jpg" brush_path = "brush05.jpg" image, brush = read_and_check_input_parameters( img_path, brush_path ) brush = ((brush > 100) * 255).astype("uint8") original_image = image.copy() brush, pts, mask_pts = crop_around_mask(brush) if pts is not None: # Check whether it is cropped image = image[pts[0] : pts[2], pts[1] : pts[3]] image_padded, brush_padded, padding_factors = resize_and_pad( image, brush, [1024, 1024] ) image_padded = image_padded[:,:,::-1] image_padded = Image.fromarray(image_padded) brush_padded = Image.fromarray(brush_padded) flux_output_img = pipe( prompt="glass", image=image_padded, mask_image=brush_padded, height=1024, width=1024, guidance_scale=30, num_inference_steps=8, max_sequence_length=512, generator=torch.Generator("cpu").manual_seed(0) ).images[0] flux_output_img.save('result.png') ``` ### Logs ```shell Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.12s/it] Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.12s/it] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:11<00:00, 1.48s/it] /diffusers/src/diffusers/image_processor.py:147: RuntimeWarning: invalid value encountered in cast images = (images * 255).round().astype("uint8") ``` ### System Info cuda 12.1 diffusers==0.33.0.dev0 ( build from source) torch==2.4.1 torchvision==0.19.1 pillow==11.1.0 opencv-python==4.10.0.84 ### Who can help? _No response_

# Optional: Custom VAE (uncomment if needed)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16
).to("cuda")

try:
    pipeline = AutoPipelineForText2Image.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        variant="fp16",
        device_map="auto"
    )#.to("cuda")
    logging.info("Pipeline loaded to GPU with float16.")
except Exception as e:
    logging.error(f"Failed to load model pipeline: {e}")
    raise

# If using VAE:
pipeline.vae = vae

Topic		Replies	Views
Error while training LORA in KOHYA_SS (stabilityai/stable-diffusion-xl-base-1.0) Beginners	21	1498	February 13, 2025
Creation of Images from Text-Prompt (Customized Training) Beginners	37	582	January 15, 2025
How long does image generation with black-forest-labs/FLUX.1-dev take? Models	4	77	July 22, 2025
Running SDXL diffusers in a container on python running ubuntu 2204, system RAM not being released Intermediate	0	986	November 27, 2023
Floating point exception with nightly pytorch and cuda 🧨 Diffusers	4	63	July 17, 2025

Program not working on GPU but works on CPU

Related topics