Program not working on GPU but works on CPU

John6666 · May 15, 2025, 5:08am

Hmm… So simply this?

# Optional: Custom VAE (uncomment if needed)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16
).to("cuda")

try:
    pipeline = AutoPipelineForText2Image.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        variant="fp16",
        #device_map="auto"
    ).to("cuda")
    logging.info("Pipeline loaded to GPU with float16.")
except Exception as e:
    logging.error(f"Failed to load model pipeline: {e}")
    raise

# pipeline.enable_model_cpu_offload()

# If using VAE:
pipeline.vae = vae

github.com/huggingface/diffusers

Problem with Flux Schnell bfloat16 multiGPU

opened 06:30AM - 16 Aug 24 UTC

closed 03:04PM - 15 Sep 24 UTC

OlegRuban-ai

bug

### Describe the bug Hello! I set device_map='balanced' and get images genera…ted in 2.5 minutes (expected in 12-20 seconds), while in pipe.hf_device_map it shows that the devices are distributed like this: ``` { "transformer": "cuda:0", "text_encoder_2": "cuda:2", "text_encoder": "cuda:0", "vae": "cuda:1" } ``` I have 3 video cards 3090 Ti 24 GB and I can’t run it on them. I also tried this way: pipe.transformer.to('cuda:2') pipe.text_encoder.to('cuda:2') pipe.text_encoder_2.to('cuda:1') pipe.vae.to('cuda:0') What is the best way to launch it so that generation occurs on the GPU and quickly? ### Reproduction ```python pipe = FluxPipeline.from_pretrained( path_chkpt, torch_dtype=torch.bfloat16, device_map='balanced', ) ``` ### Logs _No response_ ### System Info ubuntu 22.04 3 GPU: 3090 TI 24 GB accelerate==0.30.1 addict==2.4.0 apscheduler==3.9.1 autocorrect==2.5.0 chardet==4.0.0 cryptography==37.0.2 curl_cffi diffusers==0.30.0 beautifulsoup4==4.11.2 einops facexlib>=0.2.5 fastapi==0.92.0 hidiffusion==0.1.6 invisible-watermark>=0.2.0 numpy==1.24.3 opencv-python==4.8.0.74 pandas==2.0.3 pycocotools==2.0.6 pymystem3==0.2.0 pyyaml==6.0 pyjwt==2.6.0 python-multipart==0.0.5 pytrends==4.9.1 psycopg2-binary realesrgan==0.3.0 redis==4.5.1 sacremoses==0.0.53 selenium==4.2.0 sentencepiece==0.1.97 scipy==1.10.1 scikit-learn==0.24.1 supervision==0.16.0 tb-nightly==2.14.0a20230629 tensorboard>=2.13.0 tomesd transformers==4.40.1 timm==0.9.16 yapf==0.32.0 uvicorn==0.20.0 spacy==3.7.2 nest_asyncio==1.5.8 httpx==0.25.0 torchvision==0.15.2 insightface==0.7.3 psutil==5.9.6 tk==0.1.0 customtkinter==5.2.1 tensorflow==2.13.0 opennsfw2==0.10.2 protobuf==4.24.4 gfpgan==1.3.8 ### Who can help? _No response_

Topic		Replies	Views
Error while training LORA in KOHYA_SS (stabilityai/stable-diffusion-xl-base-1.0) Beginners	21	1498	February 13, 2025
Creation of Images from Text-Prompt (Customized Training) Beginners	37	582	January 15, 2025
How long does image generation with black-forest-labs/FLUX.1-dev take? Models	4	77	July 22, 2025
Running SDXL diffusers in a container on python running ubuntu 2204, system RAM not being released Intermediate	0	986	November 27, 2023
Floating point exception with nightly pytorch and cuda 🧨 Diffusers	4	63	July 17, 2025

Program not working on GPU but works on CPU

Related topics