Replace text encoder with a different encoder in Stable Diffusion

navidi · February 9, 2024, 4:52pm

Hi,

I am trying to train Stable Diffusion for image generation, and instead of default text encoder, I have a function that takes the input prompt and returns its embedding which is supposed to be used as model condition (instead of prompt encoding).

I was able to train the model by modifying the training script provided here: diffusers/examples/text_to_image/train_text_to_image.py at main · huggingface/diffusers · GitHub.

When I use StableDiffusionPipeline.from_pretrained, the generated images look like they are learning the pattern of training samples, however, as I understand the loaded checkpoint parameters does not take into account the proper condition embedding that model needs and just loads default text encoder saved by default. The reason is that there is no input argument that I can give the condition encoder as input of the StableDiffusionPipeline.from_pretrained().

Can someone pls let me know if this is possible to incorporate the condition in this setting for inference (or if not how should I modify it), and any guidance regarding making it work would be highly appreciate.

Please let me know if any clarification is needed.

Best,

Topic		Replies	Views
Stable diffusion text_to_image.py discussion 🧨 Diffusers	1	361	May 22, 2023
Add additional conditioning info 🧨 Diffusers	21	8291	March 3, 2025
Img2img How is training and inference different from text2img 🧨 Diffusers	0	1766	October 4, 2023
How to condition Stable-Diffusion on CLIP image embeddings? 🧨 Diffusers	0	1297	February 4, 2024
Replace Stable Diffusion class-conditional text with rows of attributes 🧨 Diffusers	0	445	January 27, 2024

Replace text encoder with a different encoder in Stable Diffusion

Related topics