Like playing with image/video gen. Used to do it with ComfyUI. Amazing tool that’s reasonable easy to use. Was digging in ways to run Flux faster than ComfyUI on my Mac M1 8GB. 8GB isn’t a lot, but it’s doable, so trying to save as much memory as possible and a whole backend + advance ComfyUI frontend didn’t help increase the available RAM for the diffusion process. This is my code:
It works and generates images at 100s/it (instead of 300-400s/it on ComfyUI), but I would like to speed it up a bit, by using quantized T5 xxl, just as I did in ComfyUI (and would like to be able to use the same files as ComfyUI, like I have done with the UNET).
Since T5EncoderModel is part of Transformers rather than Diffusers, it should be fine to load and use it as a Transformers model, but there may still be some bugs in the GGUF part.
If you don’t try to use the same file, you could load a different file using a different quantization method…
One possibility would be to first dequantize it and then quantize it on the fly in a different format?
How about torchao or bitsandbytes or optimum-quanto ?
3
Diffusers+Transoformers, ComfyUI, and A1111 WebUI are all completely different programs. Although their purposes and results are largely the same, their implementations are different. Compatibility is provided for convenience, but it is better to convert them in advance to avoid problems.
Hello
Thanks for reply again.
I tried the GGUF text_encoder thing, but really it didn’t save that much time. It had to de-quantize it to quantize it again (it did it automatically). So the time saved on the lower size (thus less swapping in the disk), was lost to the pre-work.
Hmm… Although it may defeat the purpose of using the same file, TorchAO or Quanto are recommended for speed optimization. If using the same file, apply these or bitsandbytes’ NF4 on the fly. GGUF is convenient, but offers little speed advantage other than loading.