I’m currently working on deploying FLUX on a Hugging Face Inference Dedicated Endpoint, but I’ve run into a few challenges. I wanted to check if anyone here has had success deploying it.
If you’ve managed to do it, I’d love to hear about your setup, tips, and any resources you found helpful. Specifically, I’m looking for insights on model optimization, handling large parameter sizes, or any custom configurations you used.
Unfortunately, there is an example of same failure, if you don’t mind.
I have never deployed to Endpoint, but deploying to Endpoint should be similar to deploying to the Serverless Inference API in terms of what needs to be done.
I am aware that they both start with the conversion of safetensors files to Diffusers format and uploading to HF.
I’ve gotten that far, but no matter how I try, the inference doesn’t work… it would work if this were SDXL or SD1.5, and it would work as well if it were frequently used and cached on the HF server, like FLUX.1 dev in BFL itself, but that’s impossible.
I think it’s fine to load from Zero GPU space, but it just doesn’t make sense.