Google Colab does not seem to be very welcoming of its use in image generation, so there could be various causes, but so far it seems that Flux generation is usually possible, so it is probably not a restriction or something.
Assuming you are using the Flux dev version, the only problems with running the free version of Colab are width and num_inference_steps. Try reducing these.
If it is definitively wrong, you will get an error, but if the computer decides that it might be able to run it over time, it will happen forever that way.
Also, if there is insufficient RAM, VRAM, or disk space, it may stop without an error, and the action in this case is similar.
Could it be the free version of Colab?
If so, with 16GB of VRAM, it will take endless hours to generate, I don’t even know if it will take an hour.
I’m sorry for the Japanese page, but you can try it with quantized flux as shown in the following page.
I think there is an English version of the know-how if you search for it.
Thanks a lot for the suggestions. Actually, as it turns out Colab does not always offer GPU support as it does for CPU support. It should work when and if it does. I was wondering if we could make any fundamental changes in the class FlowMatchEulerDiscreteSchedulerOutput of the function scheduling_flow_match_euler_discrete.
Is this a strategy to dynamically switch the use of CUDA?
The scheduler is not a regular iterator or anything like that, it’s in charge of the noise removal schedule, so unless you know a lot about it, messing with it will ruin the image. Strictly speaking, it is somewhat different, but in WebUI and ComfyUI, it is a sampler.
Maybe, better yet, determine if you can use CUDA just before inference, and if not, give up.
You should be able to determine that with torch.cuda.available().
It is not realistic to infer a Flux model using only CPU, at least for now, let alone an SD 1.5 model.
It is possible to at least offload some tensors to RAM when VRAM is insufficient, but it is enough to use pipe.enable_cpu_offloading() for that.
So you want to speed up the inference itself? Then wouldn’t it be easier to use Hyper Flux or some other such thing? The page below says SD, but you can also find SDXL and Flux’s LoRA there.
Also, there are a couple of techniques recently proposed in HF that may not be as easy to use as Hyper.