FluxPipeline error while loading Flux.1 dev

Normally, if your VRAM is enough, the following code should work, but I’ve searched and it may be a trickier error than I thought.
It means that Diffusers might not be the only cause.

I’m starting to think that the problem is a pattern of a mis-installed CUDA library and CUDA-compatible torch, or some other library that is doing something wrong.

For example, do SD1.5 and SDXL models work with the same code?
If you just replace the model name part of the code below, it should work in general.

pip install -U diffusers
import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
).images[0]
image.save("flux-schnell.png")