How to use fine tuned a pre-trained text to image model?

I am developing one application where I want to use the text to image generation model. I am done with utilising the huggingface model “StableDiffusion” model finetuning and its giving me satisfying result as well. Now while using the model at front end, it is generating output but the performance is very poor for which I understood that each time its again training from pipeline and generating the image which takes alot of time, today it took around 9 hours to generate two images. I am in dead need of solution to resolve this problem