Program not working on GPU but works on CPU

The diffusion model tends to encounter various errors (such as incomplete errors) when using float16. As you mentioned in your commented-out code, using an optional VAE is likely to work properly.

# Optional: Custom VAE (uncomment if needed)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16
).to("cuda")

try:
    pipeline = AutoPipelineForText2Image.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        variant="fp16",
        device_map="auto"
    )#.to("cuda")
    logging.info("Pipeline loaded to GPU with float16.")
except Exception as e:
    logging.error(f"Failed to load model pipeline: {e}")
    raise

# If using VAE:
pipeline.vae = vae