Replace Stable Diffusion class-conditional text with rows of attributes

tpremoli · January 27, 2024, 12:19am

I want to use Stable Diffusion model weights to generate class-conditional images- however, I don’t want these images to be conditional on a text prompt, but rather on a number of binary class attributes/rows.

In order to do this, I was thinking of using Diffusers, as it seemed the most straightforward. My thinking was to replace the CLIP text encoder/tokenizer with a custom encoder which maps the attribute rows into the latent space, however I can’t seem to find resources on this online, and was wondering if it was possible/feasible within the Diffusers library.

I understand that the StableDiffusionPipeline is likely too strict, however, I was wondering how I would define a model with these attribute rows as the conditioner for the generation, and how this model could be trained/fine-tuned.

Topic		Replies	Views
Replace text encoder with a different encoder in Stable Diffusion 🧨 Diffusers	0	1429	February 9, 2024
A couple of super basic questions 🧨 Diffusers	3	1634	November 7, 2022
How to condition Stable-Diffusion on CLIP image embeddings? 🧨 Diffusers	0	1295	February 4, 2024
Class-conditional image generation 🧨 Diffusers	1	1135	October 12, 2023
Add additional conditioning info 🧨 Diffusers	21	8278	March 3, 2025

Replace Stable Diffusion class-conditional text with rows of attributes

Related topics